What is a Web Crawler?
November 1st, 2006 by Shiva
Web crawler - Robots - Bot - Spiders are all one and the same. Every search engine have their own crawler, it is a software application or agent, they visit the web sites in periodical manner. Most legitimate bots crawl the website meta tags, keywords, description, content as well the url’s and store it in their database (local store). So whenever you do google/msn/yahoo it is not just a live search - whatever data they have collected, they index it in their storage and use their own logics and algorithm to determine which site to show for which specific “searched keywords”.
The crawler cannot see an image (It crawls and store it in a binary format). That is the reason why you have to provide an alternate tag for an image, img alt=”award winner” so if you have an image in your website, check whether you see a tooltip when you mouse over on that image. If you do not have an alternate text for an image it is not XHTML compliant web site. The SE might not punish the site for it, but you may not get brownie points. And if you have certain words like “blog” repeated in your site for thousand times (spaced next to each other) and wonder why it is not showing up in search engines - then they have penalized for Keyword Stuffing.
Did I get deviated from the answer? I think little bit, okay, so a bot is nothing but a tool, pre-written by search engines, they collect links from sites and then visits them, grabs the url’s, images, content, etc., and visit the other sites that is mentioned in your web site and goes like a chain roll. And robot does visits your website back and checks if there are any changes done in your site, if there is no change for a long time then the frequency of its visit slows down.
Last but not least, make sure you/your webhosting company has the if-modified-since HTTP header is enabled on your webserver, that is the way the bots gets to know whether anything latest is going on in your site or not. Got questions? Have doubts? Write to us.