What is Robots.txt?
Robots.txt is a text file which contains few lines of code. It is saved on the website or blog’s server which instruct the web crawlers on how to index and crawl your blog in the search results. That means you can restrict any web page on your blog from web crawlers so that it can’t get indexed in search engines like your blog labels page, your demo page or any other pages that are not as important to get indexed. Always remember that search crawlers scan the robots.txt file before crawling any web page.
How To Add Custom Robots.txt in your Blog -Blogger
1) Login to your Blogger account
2) open the blog for which you want to add Robots.txt
3) Go to Settings
4)Go to search preferences
5) Crawlers and indexing
6) Custom robots.txt – -Edit
7) Enable custom robots.txt content? — Yes
8) Copy the following data and paste it (you need to change “yourdomain.com”)
to your blogger domain name.
9) Click Save Changes
10) That’s it you are done with adding Robots.txt to your Blog
What Does the Above Lines State?
Media partner Google is the user agent for Google adsense that is used to server better relevant ads on your site based on your content. So if you disallow this they you will won’t able to see any ads on your blocked pages.
So you all know what user-agent is, so what is user-agent:*. The user-agent that is marked with (*) asterisk is applicable to all crawlers and robots that can be Bing robots, affiliate crawlers or any client software it can be.
By adding disallow you are telling robots not to crawl and index the pages. So below the user-agent:* you can see
Disallow: /search which means you are disallowing your blogs search results by default. You are disallowing crawlers in to the directory /search that comes next after your domain name. That is a search page like http://yourdomain.com/search/label/yourlabel will not be crawled and never be indexed.
Allow: / simply refers to or you are specifically allowing search engines to crawl those pages.
Sitemap helps to crawl and index all your accessible pages and so in default robots.txt you can see that your blog specifically allowing crawlers in to sitemaps. There is an issue with default Blogger sitemap.