Disclosure: WHSR is reader-supported. When you buy through our links, we may earn a commission.
Robots.txt Explained
Updated: 2022-07-26 / Article by: Jerry Low
The robots.txt file is a simple text document containing search engine crawlers' instructions. It tells them which pages to crawl and which ones to avoid. It’s like a sign for bots saying, “come here for the rules you need to use this website.”
The purpose of these files is to help search engines determine how best to crawl your site. That serves to reduce the burden on the bot and your server. After all, unnecessary requests for data won't benefit anyone in a meaningful way.
For example, there's no reason for Googlebot (or any other bots) to pull up anything but the most recent post on your blog or posts that have gotten an update.
How the Robots.txt File Works
The easiest way to understand how it works is to think of your website as a guest in your house. You have all of these things you want to show off on your walls, but you don't want guests wandering and touching things. So, you tell them: “Hey! Stay out of this room, please.”
That's what the robots.txt file does – it tells search engines where they should go (and where they shouldn't). You can achieve this miracle with simple instructions that follow some pre-defined rules.
Each website can only have a single robots.txt file and must follow that exact name – no more, no less.
Do I Need a Robots.txt File?
The short answer is yes. You should have a robots.txt file on your website.
The longer answer is that you need to know how search engine bots will crawl and index your site and then write your robots.txt file accordingly.
In addition to keeping sensitive information out of the hands of spammers and hackers, having a properly-structured and maintained robots.txt file can help improve your site’s ranking in search engine results.
The robots.txt file starts life as a simple, blank text document. That means you can create one with a tool as simple as a plain text editor like MS Notepad. You can also use the text editor in your web hosting control panel, but creating the file on your computer is safer.
Once you’ve created the document, it’s time to start filling it with instructions. You need two things for this to happen. First, you must know what you want the robots.txt file to tell bots. Next, you need to understand how to use the instructions bots can understand.
Part 1: What the Robots.txt File Can Do
Allow or block specific bots
Control the files that bots can crawl
Control the directories that bots can crawl
Control access to images
Define your sitemap
And more.
Part 2: Understanding How Robots.txt Syntax Works
Many people get confused when looking at robots.txt samples because the content seems like tech jargon. That’s reasonably accurate to the average person. The key to understanding robots.txt is to think like a computer.
Computers need instructions to work, and they process things based on them. The same is true for bots. They read instructions one line at a time. Each of those lines has to follow a specific format.
Here are some common commands for the robots.txt file;
Follow instructions for robots.txt, or things can go poorly for your website. (Source: Google)
While, in some ways, robots.txt allows you to customize bot behavior, the requirements for this to work can be pretty rigid. For example, you must place the robots.txt file in the root directory of your website. That generally means public_html or www.
While some rules are negotiable, it’s best to understand some standard guidelines;
Watch Your Order
Instructions in the robots.txt file have sequential priority. That means conflicting instructions will default to the first instance in the file. It’s like comparing a state vs. federal court ruling.
Be Detailed
When creating instructions, be as specific as possible with your parameters. The bots don’t negotiate, so tell them precisely what needs to happen.
Subdomains Are Possible
However, rules for the robots.txt file in each subdomain will only apply to the subdomain where the file resides.
Check The File
Building and dumping a robots.txt file can be a recipe for disaster. Ensure the rules or instructions you’re adding work before letting things loose.
Don’t Noindex Anything
Google says not to do it in robots.txt; hence it must be true.
Final Thoughts
Strictly speaking, you don’t need a robots.txt file. That’s especially true for smaller or static websites that don’t have a lot of content to crawl. However, larger websites will find robots.txt indispensable in reducing resources lost to web crawlers. It gives you much better control over how bots view your website.
Founder of WebHostingSecretRevealed.net (WHSR) - a hosting review trusted and used by 100,000's users. More than 15 years experience in web hosting, affiliate marketing, and SEO. Contributor to ProBlogger.net, Business.com, SocialMediaToday.com, and more.