What is a Robots.txt File?
Search engines look at millions of web pages to come up with search results. They do this with what we call "search engine spiders." This makes sense - spiders crawling around on the Web. But another word for them is "robots" because they are simply unmanned programs gathering data automatically. I can't help but picture them as the characters in the new animated movie "Robots."
In the beginning, these robots spidered every page, every file, attached to the Web. This caused problems for both the search engines and the people using them. Pages that really aren't worth looking at, such as, say, header files to be included in all pages on a site, were being spidered and showed up in search results. Have you ever searched on Google and gotten a partial page as a result?
The solution was for Google and other search engines to begin looking for a robots.txt file in the root folder of each site (http://www.mydomain.com/robots.txt) to determine what should and shouldn't be searched. This is named, "The Robots Exclusion Standard." This simple text file, created with Notepad or other simple text editor gives you complete control by telling the robots not to spider certain folders in your site. The result is happier visitors who come to your site from search engines and get only full pages that you want them to see, not partial, test or script pages you don't want them to see.
Let's look at some examples to get started:
This allows all spiders to spider all pages on your site. The * is a wildcard that means “all spiders.”
User-agent: *
Disallow:This is the opposite of the above example. This one tells all spiders to NOT spider your whole site. You might want this if you have a test site, for example, that is not live yet.
User-agent: *
Disallow: /This example tells all robots to stay out of the cgi-bin and images folders.
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/This example tells only the WebFerret robot to not spider the page ferret.htm. It’s only an example. I have nothing against WebFerret. The user agent code for Google is googlebot.
User-agent: WebFerret
Disallow: ferret.htm It is important that the file is a simple text file – do not use Microsoft Word to create it. And be careful of how you type – it must look exactly like the above examples, with caps only for the first letter, just the right spacing, etc. A poorly done robots.txt file could harm your site more than help it. For a cool online robots.txt file validator, go to http://www.searchengineworld.com/cgi-bin/robotcheck.cgi.
As an e-commerce consultant for over three years, and Web designer for over ten, Chuck Lasker has been helping individuals and organizations utilize the Internet in almost every arena. Chuck's e-newsletter and blog, The MerchantHowTo.com Report, at http://www.MerchantHowTo.com, is free and popular amongst e-store owners.
Article Source: http://EzineArticles.com/?expert=Chuck_Lasker







0 Comments:
Post a Comment
<< Home