Use the tools below to copy the article in plain text form, or you can copy it as HTML, ready to copy and paste directly into a web page.
HTML How to keep robots out of your web site How to keep robots out of your web site Author: Roberto BonomiTHE ROBOTS.TXT FILE You know that search engines have been created to help people find information quickly on the Internet, and the search engines acquire much of their information through robots (also known as spiders or crawlers), that look for web pages for them. The spiders or crawlers robots explore the web looking for and recording all kinds of information. They usually start with URL submitted by users, or from links they find on the web sites, the sitemap files or the top level of a site. Once the robot accesses the home page then recursively accesses all pages linked from that page. But the robot can also check out all the pages that can find on a particular server. After the robot finds a web page it works indexing the title, the keywords, the text, etc. But sometimes you might want to prevent search engines from indexing some of your web pages like news postings, and specially marked web pages (in example: affiliateīs pages), but whether individual robots comply to these conventions is pure voluntary. ROBOTS EXCLUSION PROTOCOL So if you want robots to keep out from some of your web pages, you can ask robots to ignore the web pages that you donīt want indexed, and to do that you can place a robots.txt file on the local root server of your web site. In example if you have a directory called e-books and you want to ask robots to keep out of it, your robots.txt file should read: User-agent: * Disallow: e-books/ When you donīt have enough control over your server to set up a robots.txt file, you can try adding a META tag to the head section of any HTML document. In example, a tag like the following tells robots not to index and not to follow links on a particular page: meta name="ROBOTS" content="NOINDEX, NOFOLLOW" Support for the META tag among robots is not so frequent as the Robots Exclusion Protocol, but most of major web indexes currently support it. NEWS POSTINGS If you want to keep the search engines out of your news postings, you can create an an "X-no-archive" line in of your postings' headers: X-no-archive: yes But although common news clients, allow you to add an X-no-archive line to the headers of your news postings, some of them donīt permit you to do so. The problem is that most search engines assume that all information they find is public unless marked otherwise. So be careful because though the robot and archive exclusion standards may help keep your material out of major search engines there are some others that respect no such rules. If you're highly concerned about the privacy of your e-mail and Usenet postings, you must use some anonymous remailers and PGP. You can read about it here: http://www.well.com/user/abacard/remail.html http://www.io.com/~combs/htmls/crypto.html http://world.std.com/~franl/pgp/ Even if you are not particularly concerned about privacy, remember that anything you write will be indexed and archived somewhere for eternity, so use the robots.txt file as much as you need it. Written by Dr. Roberto A. Bonomi Dr. Roberto Bonomi is a successful e-book writer that shares his home business experience at: http://www.easy-home-business.com If you already have, or are looking for an Internet Home Business, you can't miss the free knowledge that you'll receive at his site Article Source: http://www.articlealley.com/http://robertobonomi.articlealley.com/how-to-keep-robots-out-of-your-web-site-38733.html Text How to keep robots out of your web site Author: Roberto Bonomi THE ROBOTS.TXT FILE You know that search engines have been created to help people find information quickly on the Internet, and the search engines acquire much of their information through robots (also known as spiders or crawlers), that look for web pages for them. The spiders or crawlers robots explore the web looking for and recording all kinds of information. They usually start with URL submitted by users, or from links they find on the web sites, the sitemap files or the top level of a site. Once the robot accesses the home page then recursively accesses all pages linked from that page. But the robot can also check out all the pages that can find on a particular server. After the robot finds a web page it works indexing the title, the keywords, the text, etc. But sometimes you might want to prevent search engines from indexing some of your web pages like news postings, and specially marked web pages (in example: affiliateīs pages), but whether individual robots comply to these conventions is pure voluntary. ROBOTS EXCLUSION PROTOCOL So if you want robots to keep out from some of your web pages, you can ask robots to ignore the web pages that you donīt want indexed, and to do that you can place a robots.txt file on the local root server of your web site. In example if you have a directory called e-books and you want to ask robots to keep out of it, your robots.txt file should read: User-agent: * Disallow: e-books/ When you donīt have enough control over your server to set up a robots.txt file, you can try adding a META tag to the head section of any HTML document. In example, a tag like the following tells robots not to index and not to follow links on a particular page: meta name="ROBOTS" content="NOINDEX, NOFOLLOW" Support for the META tag among robots is not so frequent as the Robots Exclusion Protocol, but most of major web indexes currently support it. NEWS POSTINGS If you want to keep the search engines out of your news postings, you can create an an "X-no-archive" line in of your postings' headers: X-no-archive: yes But although common news clients, allow you to add an X-no-archive line to the headers of your news postings, some of them donīt permit you to do so. The problem is that most search engines assume that all information they find is public unless marked otherwise. So be careful because though the robot and archive exclusion standards may help keep your material out of major search engines there are some others that respect no such rules. If you're highly concerned about the privacy of your e-mail and Usenet postings, you must use some anonymous remailers and PGP. You can read about it here: http://www.well.com/user/abacard/remail.html http://www.io.com/~combs/htmls/crypto.html http://world.std.com/~franl/pgp/ Even if you are not particularly concerned about privacy, remember that anything you write will be indexed and archived somewhere for eternity, so use the robots.txt file as much as you need it. Written by Dr. Roberto A. Bonomi Dr. Roberto Bonomi is a successful e-book writer that shares his home business experience at: http://www.easy-home-business.com If you already have, or are looking for an Internet Home Business, you can't miss the free knowledge that you'll receive at his site Article Source: http://www.articlealley.com/http://robertobonomi.articlealley.com/how-to-keep-robots-out-of-your-web-site-38733.html About the Author: Article Title: Article Keywords: return to article Author by Roberto Bonomi ads similar articles 8 Ways To Get People To Visit Your Web SiteCopyright 2006 I.M.A.G.I.N.E. Consulting Inc. 1. Polls Hold an interactive poll on your web site. Ask visitors a poll question. Have them e-mail their vote or opinion. People love to give their 2 cents worth. They would also like to read the results......The Prosperity Automated System - Stop the Get-Rich Quick Schemes Go With A Winner!Copyright 2006 www.eliasg.com "Dream BIG to achieve BIG" is my husband's philosophy in life. He has a home based business, he buys the MEGA- million lotto ticket religiously and laughs at me for buying 2 dollar scratch-offs that have only a 500 or 5000......The Mazu Business Pack ReviewedCopyright 2006 Timothy Rohrer If you are one of those people who have been looking for a legitimate home based business for years and have found nothing but scams and broken promises, then you are among the 99% who fail when it comes to making money on......The Biggest Work at Home Pitfalls Revealed!Copyright 2006 Harrold Swalve "The secret of success is to know something nobody else knows." When you decide to build a career online, or feel that the time has come to start your own business, there are a few major pitfalls which you should try t......Web Site Promotion-Advertising Your Website For ProfitsMaking money with your web site is the dream and goal of millions of webmasters all over the internet including myself. In order to reach this goal you need a variety of web site promotions running all at the same time. Search engine optimization is s...... Tags E-Marketingsearch enginesmeta tagweb pagesweb pagespidershtml documentaccessesrobote bookscrawlerspostingsconventions socialize ads
Text How to keep robots out of your web site Author: Roberto Bonomi THE ROBOTS.TXT FILE You know that search engines have been created to help people find information quickly on the Internet, and the search engines acquire much of their information through robots (also known as spiders or crawlers), that look for web pages for them. The spiders or crawlers robots explore the web looking for and recording all kinds of information. They usually start with URL submitted by users, or from links they find on the web sites, the sitemap files or the top level of a site. Once the robot accesses the home page then recursively accesses all pages linked from that page. But the robot can also check out all the pages that can find on a particular server. After the robot finds a web page it works indexing the title, the keywords, the text, etc. But sometimes you might want to prevent search engines from indexing some of your web pages like news postings, and specially marked web pages (in example: affiliateīs pages), but whether individual robots comply to these conventions is pure voluntary. ROBOTS EXCLUSION PROTOCOL So if you want robots to keep out from some of your web pages, you can ask robots to ignore the web pages that you donīt want indexed, and to do that you can place a robots.txt file on the local root server of your web site. In example if you have a directory called e-books and you want to ask robots to keep out of it, your robots.txt file should read: User-agent: * Disallow: e-books/ When you donīt have enough control over your server to set up a robots.txt file, you can try adding a META tag to the head section of any HTML document. In example, a tag like the following tells robots not to index and not to follow links on a particular page: meta name="ROBOTS" content="NOINDEX, NOFOLLOW" Support for the META tag among robots is not so frequent as the Robots Exclusion Protocol, but most of major web indexes currently support it. NEWS POSTINGS If you want to keep the search engines out of your news postings, you can create an an "X-no-archive" line in of your postings' headers: X-no-archive: yes But although common news clients, allow you to add an X-no-archive line to the headers of your news postings, some of them donīt permit you to do so. The problem is that most search engines assume that all information they find is public unless marked otherwise. So be careful because though the robot and archive exclusion standards may help keep your material out of major search engines there are some others that respect no such rules. If you're highly concerned about the privacy of your e-mail and Usenet postings, you must use some anonymous remailers and PGP. You can read about it here: http://www.well.com/user/abacard/remail.html http://www.io.com/~combs/htmls/crypto.html http://world.std.com/~franl/pgp/ Even if you are not particularly concerned about privacy, remember that anything you write will be indexed and archived somewhere for eternity, so use the robots.txt file as much as you need it. Written by Dr. Roberto A. Bonomi Dr. Roberto Bonomi is a successful e-book writer that shares his home business experience at: http://www.easy-home-business.com If you already have, or are looking for an Internet Home Business, you can't miss the free knowledge that you'll receive at his site Article Source: http://www.articlealley.com/http://robertobonomi.articlealley.com/how-to-keep-robots-out-of-your-web-site-38733.html About the Author:
return to article