As the web gets smarter, will our anonymity evaporate?

One of the most exciting things going on in webland today, I think, is the myriad of technologies, user experiences, and computer-to-computer interactions that typically pass under the monikers of “Web 3.0” or “the semantic web.” There isn’t a lot of general agreement on what precisely these terms mean (though I think the latter is more concrete), but what many people envision as the future of the web is an online environment in which data, text, and various forms of information and media are structured in ways that are machine-readable (if not machine-interpretable), leading to all sorts of new possibilities for interoperability between websites, new forms of user-agent interaction, and generally a web experience that is less characterized by “dumb” websites.

All of this, in addition to the manifest benefts, of course probably would present new opportunities for abuse, invasive marketing techniques, and threats to users’ privacy.

A glimpse of this last concern was provided recently by a paper from some Google researchers (“Could your social networks spill your secrets?”) that details how data from two different social networking sites (e.g. LinkedIn and Myspace) could be linked together to reveal the single person behind two different public profiles, despite the profiles being relatively anonymous and not directly linked. From the NewScientist article:

That approach is dubbed “merging social graphs” by the researchers. In fact, it has already been used to identify some users of the DVD rental site Netflix, from a supposedly anonymised dataset released by the company. The identities were revealed by combining the Netflix data with user activity on movie database site IMDb.

December 2009: As an addendum to this article, I direct your attention to “project gaydar”.