Google’s next step in search?
June 23rd, 2007 by Shiva
Google is currently focusing on what is called “topicality”. Few years back it was just mere word search and people were glad to find some information on what they search for. Google is working on signals and classifiers concepts (as per NYtimes reporter). I do not want to get into terminologies, but what google wanted to achieve is to take themselves to another step towards meaningful search. What happens when a user search for “orange” - does it mean bring information on color or bring information about fruit?
I call this as entity problem, even http://challenge.spock.com/ had announced a competition to address this issue ($50,000?!) - I have worked on a similar project to address this entity issue, though it was in Intellectual Property industry - same architecture can be applied to search engine as well, but definitely a bit complicated. I would assume their existing search algorithm is good enough that based on various factors such as signals, classifiers, pagerank, referrals, age of content (http://www.waybackmachine.org/ is a great site) , etc., to index and fetch appropriate sites.
Eh, but that is not going to be enough. Google had to understand what the user thinks - more or less like Artificial Intelligence - you know what - not even AI - but Google had to be a psychic.
Google so far had relied on various websites (as such all websites throughout the world), key websites such as wikipedia, technorati, flickr or all web2 and web3 (web cube in future) will play a major role in future.
Wikipedia’s major traffic comes through google, but google isn’t doing favor to Wiki, rather Wikipedia is information and content specific. You can recollect this Wikipedia vs Dictionary post that I had written earlier. Wikipedia is dynamic, human edited, precise and with accurate information. And dictionary is mere words. So where should google focus more now? content or words oriented?
Google has more challenges to address now:
-3. Interpret websites (get words and then get synonym and index)
-2. Current and old topic search (trends)
-1. Read my mind or voice activated search
0. Utilize google groups search (discussion threads), answers.google.com (I liked this site, no idea why it got dropped)
1. Efficient Content oriented search
2. Entity related (Topicality) search.
3. avoid and identify Spamdexing
4. updating and synchronizing data servers.
5. handle evergrowing cache data
I think that’s enough for today. Oh yes, Google is my favourite search engine so far.