Monday, August 6, 2007

Google And The Personalized Search -- All's Well.. Or Orwell


Scott Buresh
Post Chronicle
Monday Aug 6, 2007

You go to Google and enter your search term. Big Brother, the totalitarian character from George Orwell's novel 1984, watches with detached interest. You see, to Big Brother, you are only a number - but he'd like to know as much about you as he can. Knowing you allows Big Brother to do many things - both good and evil.

Alright, enough of the "Big Brother" comparison - it's been done many times before (and done many times better). However, there is an important central point to be made about personalized search. Google is now (and has been for some time) collecting data on individual users, and they are assuming that users will trust them with this data to "Do No Evil," as their famous slogan goes. Only time will tell whether the trust is well-placed, or if people are willing to trust search engines with this type of data at all.

The basic principle behind personalized search is simple. When you go to Google and type in a search query, Google stores the data. As you return to the engine, a profile of your search habits is built up over time. With this information, Google can understand more about your interests and serve up more relevant search results.

For instance, let's say that you have shown an interest in the topic of sport fishing in your search queries, while your neighbor has shown an interest in musical instruments in his search queries. Over time, as these preferences are made clear to the engine, your personalized search results for the term "bass" will largely be comprised of results that cover the fish while your neighbor's results for "bass" will be comprised of results that primarily cover the musical instrument.

At present, you need to have signed up for a Google service for your results to be personalized. Such services include Gmail, AdWords, Google Toolbar, and many others. By default, as long as you are signed in to one of these programs, your personal search data will be collected. The term "at present" is used because Google certainly could implement personalized search on any user of the engine, regardless of whether he or she has a Google account. Google already places a cookie, or unique identifier, on the machine of anyone who types in a search query on Google - it would not be hard for them to use that information, rather than the Google account, to collect individual user data and personalize results. It is quite possible that Google is testing the waters of personalized search with people who have opted in to one of its services, and will expand the system to all users if there is limited uproar or government intervention.

For search engine optimization firms, the major shift brought about by personalized search will be in how they report on Google ranking data to clients. When collecting this data, they will have to run from a "clean" machine - that is, one that has no Google programs or cookies on it. The baseline results that are reported to the client will essentially be a snapshot of what a search engine user would see if they had no Google software installed. The good news is that Google account holders who have shown an interest in certain products and services will likely have results more favorable to the client than the baseline results indicate since personalized search assures that their search histories will be reviewed and the results likely skewed toward the client's industry. The bad news is that the search engine optimization firm will be hard-pressed to demonstrate this - not to mention that the results that the client using a Google program has on its own personal machines will almost certainly not match up with the results that the firm is reporting (although the client machines should have better results, for the same reasons cited above).

Some people find the practice of storing information for personalized search purposes disturbing; others find the end result to be useful (still others find themselves experiencing an odd combination of both reactions). In defense of the engines, it is not as if they are building a dossier on individuals - again, you are only a number to them. However, the potential for misuse of the data is fairly high.

There are many advertising firms out there already that go through the cookies on your machine to figure out which ads will have the best effect on you. If you've ever been on a website and seen a banner ad that is directly related to something you have been doing research on lately, it is most likely not a coincidence. The ad platform simply browsed through the cookies on your machine to find out what topic held your interest, and dropped in a related ad once it determined what that topic was. Search engines have been buying firms with this technology lately; notable recent purchases include that of DoubleClick by Google and aQuantive by Microsoft. There seems to be little doubt that your search history will be combined with existing ad-serving technology to deliver even more relevant ads. Whether this constitutes misuse seems to be debatable - some people seem to have no problem with it, while it makes many others fairly uneasy.

Privacy issues that arise from personalized search are also a big question. The EU recently announced that it is probing into how long Google stores user information (this probe was subsequently extended to include all search engines). AOL recently committed a serious blunder when it released search data from 500,000 of its users, and it was discovered that it was fairly easy to identify many people by the search terms that they use (anybody ever "ego surf" - that is, type your own name into a search engine to see what comes up? If so, you wouldn't be hard to spot). In addition, since the IP address of the computer creating the query is also reportedly tracked, a court order forcing the engine and the ISP (Internet Service Provider) to provide specific search data on individuals is a distinct possibility - the technology required to deliver upon such a demand is already in use.

Unless the government intervenes, the question will probably be decided by personal preference. As it becomes more common knowledge that Google (and other engines) store this type of data to enable personalized search, many users will take measures to block its use.

Are the search engines that collect this data "Doing No Evil?" The answer, I believe, will depend on each individual's definition of evil. In the meantime, don't be surprised when you type in a search query, and the engine seems to be reading your mind. It isn't, really - it's merely parsing through your memories.