Use the tools below to copy the article in plain text form, or you can copy it as HTML, ready to copy and paste directly into a web page.
HTML Creating a Robots.txt file Creating a Robots.txt file Author: Sumantra RoyCreating a Robots.txt file By Sumantra Roy Some people believe that they should create different pages for different search engines, each page optimized for one keyword and for one search engine. Now, while I don't recommend that people create different pages for different search engines, if you do decide to create such pages, there is one issue that you need to be aware of. These pages, although optimized for different search engines, often turn out to be pretty similar to each other. The search engines now have the ability to detect when a site has created such similar looking pages and are penalizing or even banning such sites. In order to prevent your site from being penalized for spamming, you need to prevent the search engine spiders from indexing pages which are not meant for it, i.e. you need to prevent AltaVista from indexing pages meant for Google and vice-versa. The best way to do that is to use a robots.txt file. You should create a robots.txt file using a text editor like Windows Notepad. Don't use your word processor to create such a file. Here is the basic syntax of the robots.txt file: User-Agent: [Spider Name] Disallow: [File Name] For instance, to tell AltaVista's spider, Scooter, not to spider the file named myfile1.html residing in the root directory of the server, you would write User-Agent: Scooter Disallow: /myfile1.html To tell Google's spider, called Googlebot, not to spider the files myfile2.html and myfile3.html, you would write User-Agent: Googlebot Disallow: /myfile2.html Disallow: /myfile3.html You can, of course, put multiple User-Agent statements in the same robots.txt file. Hence, to tell AltaVista not to spider the file named myfile1.html, and to tell Google not to spider the files myfile2.html and myfile3.html, you would write User-Agent: Scooter Disallow: /myfile1.html User-Agent: Googlebot Disallow: /myfile2.html Disallow: /myfile3.html If you want to prevent all robots from spidering the file named myfile4.html, you can use the * wildcard character in the User-Agent line, i.e. you would write User-Agent: * Disallow: /myfile4.html However, you cannot use the wildcard character in the Disallow line. Once you have created the robots.txt file, you should upload it to the root directory of your domain. Uploading it to any sub-directory won't work - the robots.txt file needs to be in the root directory. I won't discuss the syntax and structure of the robots.txt file any further - you can get the complete specifications from here. Now we come to how the robots.txt file can be used to prevent your site from being penalized for spamming in case you are creating different pages for different search engines. What you need to do is to prevent each search engine from spidering pages which are not meant for it. For simplicity, let's assume that you are targeting only two keywords: "tourism in Australia" and "travel to Australia". Also, let's assume that you are targeting only three of the major search engines: AltaVista, HotBot and Google. Now, suppose you have followed the following convention for naming the files: Each page is named by separating the individual words of the keyword for which the page is being optimized by hyphens. To this is added the first two letters of the name of the search engine for which the page is being optimized. Hence, the files for AltaVista are tourism-in-australia-al.html travel-to-australia-al.html The files for HotBot are tourism-in-australia-ho.html travel-to-australia-ho.html The files for Google are tourism-in-australia-go.html travel-to-australia-go.html As I noted earlier, AltaVista's spider is called Scooter and Google's spider is called Googlebot. A list of spiders for the major search engines can be found here. Now, we know that HotBot uses Inktomi and from this list, we find that Inktomi's spider is called Slurp. Using this knowledge, here's what the robots.txt file should contain: User-Agent: Scooter Disallow: /tourism-in-australia-ho.html Disallow: /travel-to-australia-ho.html Disallow: /tourism-in-australia-go.html Disallow: /travel-to-australia-go.html User-Agent: Slurp Disallow: /tourism-in-australia-al.html Disallow: /travel-to-australia-al.html Disallow: /tourism-in-australia-go.html Disallow: /travel-to-australia-go.html User-Agent: Googlebot Disallow: /tourism-in-australia-al.html Disallow: /travel-to-australia-al.html Disallow: /tourism-in-australia-ho.html Disallow: /travel-to-australia-ho.html When you put the above lines in the robots.txt file, you instruct each search engine not to spider the files meant for the other search engines. When you have finished creating the robots.txt file, double-check to ensure that you have not made any errors anywhere in it. A small error can have disastrous consequences - a search engine may spider files which are not meant for it, in which case it can penalize your site for spamming, or, it may not spider any files at all, in which case you won't get top rankings in that search engine. An useful tool to check the syntax of your robots.txt file can be found here. While it will help you correct syntactical errors in the robots.txt file, it won't help you correct any logical errors, for which you will still need to go through the robots.txt thoroughly, as mentioned above. Article Source: http://www.articlealley.com/http://sumantraroy.articlealley.com/creating-a-robotstxt-file-88.html Occupation: Search Engine Positioning Specialists Article by Sumantra Roy. Sumantra is one of the most respected and recognized search engine positioning specialists on the Internet. For more articles on search engine placement, subscribe to his 1st Search Ranking Newsletter by sending a blank email to mailto:1stSearchRanking.999.99@optinpro.com or by going to http://www.1stSearchRanking.net http://www.1stSearchRanking.net Text Creating a Robots.txt file Author: Sumantra Roy Creating a Robots.txt file By Sumantra Roy Some people believe that they should create different pages for different search engines, each page optimized for one keyword and for one search engine. Now, while I don't recommend that people create different pages for different search engines, if you do decide to create such pages, there is one issue that you need to be aware of. These pages, although optimized for different search engines, often turn out to be pretty similar to each other. The search engines now have the ability to detect when a site has created such similar looking pages and are penalizing or even banning such sites. In order to prevent your site from being penalized for spamming, you need to prevent the search engine spiders from indexing pages which are not meant for it, i.e. you need to prevent AltaVista from indexing pages meant for Google and vice-versa. The best way to do that is to use a robots.txt file. You should create a robots.txt file using a text editor like Windows Notepad. Don't use your word processor to create such a file. Here is the basic syntax of the robots.txt file: User-Agent: [Spider Name] Disallow: [File Name] For instance, to tell AltaVista's spider, Scooter, not to spider the file named myfile1.html residing in the root directory of the server, you would write User-Agent: Scooter Disallow: /myfile1.html To tell Google's spider, called Googlebot, not to spider the files myfile2.html and myfile3.html, you would write User-Agent: Googlebot Disallow: /myfile2.html Disallow: /myfile3.html You can, of course, put multiple User-Agent statements in the same robots.txt file. Hence, to tell AltaVista not to spider the file named myfile1.html, and to tell Google not to spider the files myfile2.html and myfile3.html, you would write User-Agent: Scooter Disallow: /myfile1.html User-Agent: Googlebot Disallow: /myfile2.html Disallow: /myfile3.html If you want to prevent all robots from spidering the file named myfile4.html, you can use the * wildcard character in the User-Agent line, i.e. you would write User-Agent: * Disallow: /myfile4.html However, you cannot use the wildcard character in the Disallow line. Once you have created the robots.txt file, you should upload it to the root directory of your domain. Uploading it to any sub-directory won't work - the robots.txt file needs to be in the root directory. I won't discuss the syntax and structure of the robots.txt file any further - you can get the complete specifications from here. Now we come to how the robots.txt file can be used to prevent your site from being penalized for spamming in case you are creating different pages for different search engines. What you need to do is to prevent each search engine from spidering pages which are not meant for it. For simplicity, let's assume that you are targeting only two keywords: "tourism in Australia" and "travel to Australia". Also, let's assume that you are targeting only three of the major search engines: AltaVista, HotBot and Google. Now, suppose you have followed the following convention for naming the files: Each page is named by separating the individual words of the keyword for which the page is being optimized by hyphens. To this is added the first two letters of the name of the search engine for which the page is being optimized. Hence, the files for AltaVista are tourism-in-australia-al.html travel-to-australia-al.html The files for HotBot are tourism-in-australia-ho.html travel-to-australia-ho.html The files for Google are tourism-in-australia-go.html travel-to-australia-go.html As I noted earlier, AltaVista's spider is called Scooter and Google's spider is called Googlebot. A list of spiders for the major search engines can be found here. Now, we know that HotBot uses Inktomi and from this list, we find that Inktomi's spider is called Slurp. Using this knowledge, here's what the robots.txt file should contain: User-Agent: Scooter Disallow: /tourism-in-australia-ho.html Disallow: /travel-to-australia-ho.html Disallow: /tourism-in-australia-go.html Disallow: /travel-to-australia-go.html User-Agent: Slurp Disallow: /tourism-in-australia-al.html Disallow: /travel-to-australia-al.html Disallow: /tourism-in-australia-go.html Disallow: /travel-to-australia-go.html User-Agent: Googlebot Disallow: /tourism-in-australia-al.html Disallow: /travel-to-australia-al.html Disallow: /tourism-in-australia-ho.html Disallow: /travel-to-australia-ho.html When you put the above lines in the robots.txt file, you instruct each search engine not to spider the files meant for the other search engines. When you have finished creating the robots.txt file, double-check to ensure that you have not made any errors anywhere in it. A small error can have disastrous consequences - a search engine may spider files which are not meant for it, in which case it can penalize your site for spamming, or, it may not spider any files at all, in which case you won't get top rankings in that search engine. An useful tool to check the syntax of your robots.txt file can be found here. While it will help you correct syntactical errors in the robots.txt file, it won't help you correct any logical errors, for which you will still need to go through the robots.txt thoroughly, as mentioned above. Article Source: http://www.articlealley.com/http://sumantraroy.articlealley.com/creating-a-robotstxt-file-88.html About the Author: Article by Sumantra Roy. Sumantra is one of the most respected and recognized search engine positioning specialists on the Internet. For more articles on search engine placement, subscribe to his 1st Search Ranking Newsletter by sending a blank email to mailto:1stSearchRanking.999.99@optinpro.com or by going to http://www.1stSearchRanking.net http://www.1stSearchRanking.net Article Title: Article Keywords: return to article Author by Sumantra Roy Article by Sumantra Roy. Sumantra is one of the most respected and recognized search engine positioning specialists on the Internet. For more articles on search engine placement, subscribe to his 1st Search Ranking Newsletter by sending a blank email to mailto:1stSearchRanking.999.99@optinpro.com or by going to http://www.1stSearchRanking.ne t URL: http://www.1stSearchRanking.net ads similar articles robots.txt, nofollow, noindex, and Search Engine BehaviorSearch Engine Behavior Take a quick look at the second page of search results for this site, about a week after google began indexing the page. Some links have far more information! Google Search Results Example nofollow noindex index follow Clearly, t......Importance of SEO(Search engine Optimization)Importance of SEO(Search engine Optimization)Doing search engine submission and just having a site isn't enough —youhave to get customers to find you. To do that, your web site has to usecertain techniques and technologies that make search engines r......Get your free AdSense ready websiteIf you are newbie in online money making game you will probably have problems creating your first AdSense ready website. So here we come to help at Ready AdSense websites you can choose beetwen two readymade turnkey AdSense websites. One website is o......How To Do SEO for Google? [Part 4] In this article I want to continue discussion started at previous article about Search Engine Optimization for Google. Most techniques were discussed in previous three articles. Here I want to pay attention more on how to use those techniques. 1. Use......How A Sitemap Can Benefit Search Engines and VisitorsA website's ranking with the search engines is one of the most important elements of business success. It is important to get your pages indexed so that you can show up in the results of searches done using the search engines. Indexing is a process by...... Tags Search Enginessearch enginesgooglesearch engine spidersspammingrobotsaltavistasyntaxword processorroot directoryscooter socialize ads
Text Creating a Robots.txt file Author: Sumantra Roy Creating a Robots.txt file By Sumantra Roy Some people believe that they should create different pages for different search engines, each page optimized for one keyword and for one search engine. Now, while I don't recommend that people create different pages for different search engines, if you do decide to create such pages, there is one issue that you need to be aware of. These pages, although optimized for different search engines, often turn out to be pretty similar to each other. The search engines now have the ability to detect when a site has created such similar looking pages and are penalizing or even banning such sites. In order to prevent your site from being penalized for spamming, you need to prevent the search engine spiders from indexing pages which are not meant for it, i.e. you need to prevent AltaVista from indexing pages meant for Google and vice-versa. The best way to do that is to use a robots.txt file. You should create a robots.txt file using a text editor like Windows Notepad. Don't use your word processor to create such a file. Here is the basic syntax of the robots.txt file: User-Agent: [Spider Name] Disallow: [File Name] For instance, to tell AltaVista's spider, Scooter, not to spider the file named myfile1.html residing in the root directory of the server, you would write User-Agent: Scooter Disallow: /myfile1.html To tell Google's spider, called Googlebot, not to spider the files myfile2.html and myfile3.html, you would write User-Agent: Googlebot Disallow: /myfile2.html Disallow: /myfile3.html You can, of course, put multiple User-Agent statements in the same robots.txt file. Hence, to tell AltaVista not to spider the file named myfile1.html, and to tell Google not to spider the files myfile2.html and myfile3.html, you would write User-Agent: Scooter Disallow: /myfile1.html User-Agent: Googlebot Disallow: /myfile2.html Disallow: /myfile3.html If you want to prevent all robots from spidering the file named myfile4.html, you can use the * wildcard character in the User-Agent line, i.e. you would write User-Agent: * Disallow: /myfile4.html However, you cannot use the wildcard character in the Disallow line. Once you have created the robots.txt file, you should upload it to the root directory of your domain. Uploading it to any sub-directory won't work - the robots.txt file needs to be in the root directory. I won't discuss the syntax and structure of the robots.txt file any further - you can get the complete specifications from here. Now we come to how the robots.txt file can be used to prevent your site from being penalized for spamming in case you are creating different pages for different search engines. What you need to do is to prevent each search engine from spidering pages which are not meant for it. For simplicity, let's assume that you are targeting only two keywords: "tourism in Australia" and "travel to Australia". Also, let's assume that you are targeting only three of the major search engines: AltaVista, HotBot and Google. Now, suppose you have followed the following convention for naming the files: Each page is named by separating the individual words of the keyword for which the page is being optimized by hyphens. To this is added the first two letters of the name of the search engine for which the page is being optimized. Hence, the files for AltaVista are tourism-in-australia-al.html travel-to-australia-al.html The files for HotBot are tourism-in-australia-ho.html travel-to-australia-ho.html The files for Google are tourism-in-australia-go.html travel-to-australia-go.html As I noted earlier, AltaVista's spider is called Scooter and Google's spider is called Googlebot. A list of spiders for the major search engines can be found here. Now, we know that HotBot uses Inktomi and from this list, we find that Inktomi's spider is called Slurp. Using this knowledge, here's what the robots.txt file should contain: User-Agent: Scooter Disallow: /tourism-in-australia-ho.html Disallow: /travel-to-australia-ho.html Disallow: /tourism-in-australia-go.html Disallow: /travel-to-australia-go.html User-Agent: Slurp Disallow: /tourism-in-australia-al.html Disallow: /travel-to-australia-al.html Disallow: /tourism-in-australia-go.html Disallow: /travel-to-australia-go.html User-Agent: Googlebot Disallow: /tourism-in-australia-al.html Disallow: /travel-to-australia-al.html Disallow: /tourism-in-australia-ho.html Disallow: /travel-to-australia-ho.html When you put the above lines in the robots.txt file, you instruct each search engine not to spider the files meant for the other search engines. When you have finished creating the robots.txt file, double-check to ensure that you have not made any errors anywhere in it. A small error can have disastrous consequences - a search engine may spider files which are not meant for it, in which case it can penalize your site for spamming, or, it may not spider any files at all, in which case you won't get top rankings in that search engine. An useful tool to check the syntax of your robots.txt file can be found here. While it will help you correct syntactical errors in the robots.txt file, it won't help you correct any logical errors, for which you will still need to go through the robots.txt thoroughly, as mentioned above. Article Source: http://www.articlealley.com/http://sumantraroy.articlealley.com/creating-a-robotstxt-file-88.html About the Author: Article by Sumantra Roy. Sumantra is one of the most respected and recognized search engine positioning specialists on the Internet. For more articles on search engine placement, subscribe to his 1st Search Ranking Newsletter by sending a blank email to mailto:1stSearchRanking.999.99@optinpro.com or by going to http://www.1stSearchRanking.net http://www.1stSearchRanking.net
return to article