It’s true! You are a unique flower. How many times have you allowed a company to collect anonymous location data about you in exchange for some service? Perhaps it’s a social alert system that lets your friends know when you are nearby. Or maybe you have your prolific photography tagged with GPS coordinates on Flickr.
Researchers Golle and Partridge at Stanford have found that similarly “anonymised” location traces are really not that anonymous. In their seminal 2009 paper, they show that with as little information as the location of your work and home, and as low resolution as your city block, you can be identified as one particular person. This doesn’t mean necessarily that you can be identified by name, but with a little effort, that connection is not far behind.
Your Home and Work Points to You
So who would have your work and home information? Location-based services often get intermittent data but if you examine information from any particular person, it’s not that hard to figure out which location they call home and which work. For e.g. you might check-in to restaurants every day around lunch-time, and these would most likely be near your workplace. Similarly, you may post photos that were clearly taken at home and leave the GPS data available, providing your home location.
Revealing where one lives and works at the granularity of census blocks is uniquely identifying for a majority of the U.S. working population. – Golle and Partridge
Golle and Partridge used the LEHD Origin-Destination Dataset, which is a massive dataset compiled by the U.S. Census Bureau on where people live and work. LEHD includes almost all jobs except those that are considered agriculture, federal or military. Of course, the database itself does not reveal personal information, but it provides a valuable source of data to analyze how anonymous we really are. Turns out, “revealing where one lives and works at the granularity of census blocks is uniquely identifying for a majority of the U.S. working population”. Uniquely Identifying.
That’s pretty, well, NOT anonymous, and companies and services should stop pretending that it is. But what’s even more interesting is using their method to critically evaluate other “anonymous” data. Here’s a simple explanation of their method. Take the information you are providing as anonymous and estimate the number of people with the exact same data – this group of people is your anonymity set. The bigger the anonymity set, the more anonymous you actually are. The smaller the set, the easier it would be to actually identify you.
This may not be easy in all cases but in some, it underlines how trivial identification really is. For e.g. with a name like Shimona, I’m amused by how many companies think they are giving me a vestige of privacy by using my first name and last initial. Since Shimona isn’t a terribly common name, my anonymity set is actually quite small. Put “shimona c” in Google and the first hit is various reviews by me on yelp, from which you can find my current neighbourhood. Yes, there’s one or two other hits that are my anonymity set buddies, but I’d be stupid to think that I get any anonymity from providing my first name and last initial. On the other hand, first initial, last name works pretty well for me.
What anonymous data have you provided that was probably uniquely identifying after all?
Do you mind the loss of anonymity or use it to your advantage?
Photo on main page by Flickr user karlos92