What if you get to know that someone in Kimberly, Australia, knows where you went biking last Sunday? If this doesn’t creep you enough, what if the person also knows that you ran across the pavement in front of your house yesterday? If you don’t come under the Unconcerned category of Westin’s privacy segmentation, you must have guessed that we are hinting at the scary truth of the current day’s location privacy.
With ubiquitous apps like Google(search), Facebook, Uber keeping records of your movements almost every second(with or without your permission), its hard to imagine how much location-based information do services like Google Maps, Foursquare, Strava posses.
What our project is about?
If these apps are collecting location-based data, then what do we, as students, got to do about it?
Well, in this project, we didn’t do any rocket science, we only took publicly available routes uploaded by a user and predicted a tiny detail of the user, his/her home(or work) location.
The home/work location might seem so common to know about; your neighbors know it, your friends know it, some people from your workplace know it. So what, if we know it?
The difference lies in the fact that, we as students, recognize you and know your personal details, but you, whose information we possess, has no clue about it.
In an extreme case of criminal offense done using this location-based information of yours, someone like us wouldn’t even be part of the list of possible suspects.
Our analysis was done on the maps of users uploaded on Strava. If you don’t know what Strava is,
the next section is here to help!
What is Strava?
Strava is a social fitness network that primarily used to track cycling, and running using GPS data.
Created by the millions of Strava athletes, segments mark popular stretches of road or trail(like your favorite local climb) and create a leader-board of times set by every Strava athlete who has been there before. Strava has an active user base with the addition of one million new users every 45 days and 8 million activities being uploaded each day.
How do we do it?
The following are the steps getting to an address from a user:
We made an application through the official Strava API.
This application was presented to Strava users to authenticate it in order to receive their access tokens. Why this authentication is required? It is a measure taken by Strava to limit the collection and processing of publicly available maps(and activities) of Strava users by anyone on the Internet(developers) through its official API.
The following is a shot of the message we sent to active Strava users on Twitter.
Heat maps of all activities of a user.
Upon receiving users’ access tokens, we could get access to their all their publicly available maps and activity traces.
We first converted these activity traces(in the form of poly-lines) to coordinates(latitudes and longitudes).
Then plotted these on geographical maps. Here is the sample of a user’s map.
Start and end locations tags for each activity.
Now that we have every trace in the form of coordinates, we assigned start and end of activity markers to all the activities of a particular user. We also aggregated these counts based on their frequency in nearby locations.
Aggregation and pinpointing.
Finally, we aggregated these coordinates and assigned the maximum probability score to a location based on the frequency of coordinates (rounded to decimal places of 4).
We then converted the lat-longs back to an address, thus predicting the home location of the user.
Privacy Zones: A case in view
Strava has implemented a privacy feature especially to protect its users from personalized attacks of this sort in the form of a feature, called privacy zones.
A privacy zone allows users to hide all activity traces within a zone, a patch of circular location centering on a chosen coordinate. This deception technique fundamentally allows the user to block out her/his house location and areas nearby from becoming public.
To see how effective privacy zones really are, we first tried running our experiment on a user’s Strava data without privacy zones feature enabled. Next, we predicted the address of the user with privacy zone enabled.
From what we observed, the difference(in distance), between the two predicted locations was not much (~30m).
This is because our prediction technique involves
Aggregation of multiple coordinates of users’ activities tracks.
Rounded off coordinates(up to 4 decimal digits) to get a better estimate of the exact location.
This proves that having predicted your location from privacy zones, combined with some social engineering can still lead them to your correct address!
Concerns
Though Strava has privacy options like privacy zones and some settings in place, our experiment proves that these features are not enough and strong enough. Furthermore, here is an article which specifically states how home locations can be predicted using privacy zones.
It may seem very scary, but this is the reality of today’s social media platforms. We can’t stop using these popular mainstream social media platforms just because of some small security and privacy vulnerabilities in them. Vulnerability, by definition, means a surface open to potential attacks(which may/not be severe).
So, what can a simple user do? A simple user can only become aware of such possibilities of an attack and react accordingly, which is essentially the purpose of our project!
If you are frequent on Strava, with a few activities uploaded, you could give our app a try:
Introduction Our project deals with the social media website Gab, which rose to prominence recently when Pittsburgh synagogue shooter, Robert Bowers, was found to be an active user of the social media website. But, what is Gab? Gab is basically a Twitter clone, and just like Twitter, it allows users to read and write multimedia messages of up to 300 characters, called "gabs". The major difference in the two websites is the guidelines governing them. Unlike Twitter, Gab has a small set of rules: No threats/terrorism, No illegal pornography, Legal pornography is allowed(but must be marked NSFW), and No doxxing Due to Gab's very limited set of rules rules (along with poor enforcement), it has become a breeding ground for conservative, libertarian, nationalists and populist internet users. Data Using Gab's API, we crawled the website and collected a total of 18562 gabs, around 10000 user profiles, 600 trending topics and 7500 trendi...
Authenticity of Linkedin Profiles poster Webapp demo webapp Problem Lot of people put false work experience and other information on Linkedin in order to gain attention of the recruiters. Given a linkedin profile, determine the evidence of authenticity of information present on the user's linkedin profile from publicly available information about that user on other social networks (for example facebook). Data Collection Approach : Given a username on linkedin, we scrape the information on the profile (such as work experience, education etc). From the personal information such as name, age, city etc, we then go to facebook and search for similar profiles. We try to find the most similar profile from the search result. We then scrape the timeline of the user and compare that information with the information on linkedin profile. Method : Linkedin does not provide any API to get the user profile information. Wrote a scraper using selenium to ...
Introduction Our world today revolves largely around the internet, especially social media. With the advent of internet giants like Facebook, Twitter, Google, the internet has become a hub for a large part of the human population. Naturally, a large amount of personal data exists within the “secure” databases of these giants. Just how much about a person can be inferred from this treasure trove of personal data on these databases? That is exactly what we decided to figure out with this project. What is LinkIt? LinkIt is a project of the Privacy and Secirity in Online Social Media Course that aims to create awareness about the consequences of accepting requests from unknown people on online social media platforms. Aim We attempt to analyze the statistics of a group of people across multiple social networks like Facebook, Instagram, and Twitter to gauge the following Emotional band Popularity Level of activity and active hours What their broad interests are (based...
Comments
Post a Comment