Current date and local time on this server is 2012-02-06 (Mon) 05:25:22

Block search engine’s bot to crawl your private files

The robots.txt file is a text file containing commands to the engine crawlers research to clarify their pages who may or may not be indexed. Thus any search engine began its exploration of a website seeking robots.txt at the root of the site.

first of all you have to understand the format of robots.txt

The robots.txt (written in lower case and plural) is an ASCII file that are at the root of the site and may contain the following commands:

  • User-Agent: allows you to specify the robot affected by the following guidelines.
  • The value means “all search engines”.
  • Disallow: allows you to specify the pages to exclude from indexing. Each page or path to exclude must be on a line at hand and must begin with. The value / sole means “all pages.”

The robots.txt file should contain no blank line!

Now if you want to stop search engine to access your specific private file then you have to write the following code in robots.txt-

  • Exclusion of single page-

User-Agent: *
Disallow: / directory / path / page.html

  • stop several pages to crawl-

User-Agent: *
Disallow: / directory / path / page.html
Disallow: / repertoire/chemin/page2.html
Disallow: / repertoire/chemin/page3.html

  • Exclusion of all pages of a directory and its subfolders-

User-Agent: *
Disallow: / directory /

  • Exclusion of all pages-

User-Agent: *
Disallow: /

  • To block files of a specific file type (for example, .gif)

User-agent: Googlebot
Disallow: /*.gif$

Now how to exclude all other BOT except Google, MSN, Yahoo? just type following simple code to your robots.txt and enjoy….

User-agent: Googlebot
User-agent: Slurp
User-agent: msnbot
User-agent: Mediapartners-Google*
User-agent: Googlebot-Image
User-agent: Yahoo-MMCrawler
Disallow:

User-agent: *
Disallow: /

With .htaccess file-
Yes, you can block the BOT by ip’s in .htaccess
For example, if the BOT has ip: 66.249.71.xxx
The command in .htaccess will be:

<Limit GET POST>
order deny,allow
deny from 66.249.71.xxx
allow from all
</Limit>

Leave a comment

Your comment