« May 2007 | Main | October 2008 »

June 2007 Archives

June 3, 2007

There is Still Lots of Room to Improve Search

Today's New York Times Online Edition ran an article on Google and it's search algorithm: "Google Keeps Tweaking Its Search Engine" by Saul Hansell. I read the article and was at a loss to find a lot new, despite Mr. Hansell's claim that his day at Google's headquarters afforded key engineers an opportunity to "[explain] more than they ever have before in the news media about how their search system works."

What was new for me was the apparent reality (implicit to what Mr. Hansell has to say, but not explicitly said) that Page Rank is merely a contributor to the order of results served up by Google's systems in response to queries. I got to this point from Mr. Hansell"s statement that "Mr. Singhal has developed a far more elaborate system for ranking pages. . . " If he has, in fact, gone beyond Page Rank, then I am not as concerned for clients with slipping Page Rank as I would otherwise be.

But Page Rank is not the point of this entry. The point is that, from my perch, Mr. Hansell's suggestion that Google has substantially improved the responses it serves up to queries, is (1) not the case, or (2) fine, but we still have lots and lots and lots of improvements to make. If either (1) or (2) are correct, there are plentiful opportunities in this online search arena.

Consider this query:

what does "sourceid=navclient" mean

What I am looking for is a definition of the phrase "sourceid=navclient." This phrase appears frequently in access logs that I review.

Google's systems never came close to serve a result at all pertinent to the query. The first result, Sacramento Blogs | SacStarts merely includes the exact phrase because the blog is now findable with Google.

So the vaunted algorithm can't do much with the term what; or with the phrase what does; or with the "distributed" phrase what does <> mean
(with a specific "word", "sourceid=navclient" plugged in the middle, within the opposite angle brackets)

In fact, the big picture is that Google does not serve results accurately to human language queries and, in fact, tries to fight the process by prodding posters not to include natural phraseology. For example, if you include the words "a, the, to", etc, within your query phrase to post in conformance with natural sentence structure, the systems' response will tell you that you needn't bother with the words since they are already included. This response is, on the one hand, smug and, on the other highly negative as human beings tend to communicate in sentences. In the case of my query, an inclusion of a question mark at the end did not help. Google's systems still did not "get it". The fact that the query syntax conformed to the syntax of a question was completely ignored by the lexical program.

In sum, unfortunately Mr. Hansell sees a big deal in some stuff that is still far short of the big deal that we all need to improve our searching online.

The flip side is that there is still tons of room for some entrepeneurs who may want to step up to the plate and duke it out with Godzilla.

©, 2007, Mike Blonder. All Rights Reserved. No Reprint without Prior Permission.


About June 2007

This page contains all entries posted to Mike Blonder: Thoughts on Technology, and the Web in June 2007. They are listed from oldest to newest.

May 2007 is the previous archive.

October 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33