robots.txt Generator – Create robots.txt Free

Quick Presets

User Agent Rules

Rule 1

Allow

Disallow

Validate

Upload robots.txt to your website root:

yoursite.com/robots.txt

What is robots.txt?

robots.txt is a plain-text file placed at the root of your website (e.g. https://yoursite.com/robots.txt) that instructs web crawlers which pages they may or may not access. It follows the Robots Exclusion Protocol, a widely adopted standard respected by all major search engines including Google, Bing, and Yandex.

The file is publicly accessible — any visitor can read it by appending /robots.txt to your domain. This means it should never be used to hide sensitive information; it is a crawl directive, not a security mechanism.

How robots.txt Works

When a search engine bot visits your site, it fetches robots.txt first. It then parses the rules to determine which URLs it is permitted to crawl. Rules are matched top-to-bottom, and the most specific matching rule wins. If no matching rule exists, the page is considered allowed.

Well-behaved bots respect these rules. Malicious scrapers may not. Crawl rules apply to page fetching only — they do not prevent a URL from being indexed if it is linked from elsewhere on the web.

robots.txt Syntax Explained

DirectiveExampleMeaning
User-agent*All robots
User-agentGooglebotGoogle's crawler only
Allow/public/Allow this path (overrides Disallow)
Disallow/admin/Block this path from crawling
Crawl-delay10Wait 10 seconds between requests
Sitemap/sitemap.xmlLocation of your XML sitemap

Common robots.txt Examples

Allow everything (default)

User-agent: *
Allow: /

Block admin and private areas

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /dashboard/

Sitemap: https://yoursite.com/sitemap.xml

Block all crawlers

User-agent: *
Disallow: /

Block Googlebot from images only

User-agent: Googlebot
Allow: /

User-agent: Googlebot-Image
Disallow: /

E-commerce — block cart and account

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /wishlist/

Sitemap: https://yoursite.com/sitemap.xml

Blocking AI Crawlers

As AI companies train large language models on web content, many have introduced dedicated crawler bots. You can block them in robots.txt using their User-agent identifiers:

  • GPTBot— OpenAI's training crawler. Blocking it prevents your content from being used to train ChatGPT and future OpenAI models.
  • ChatGPT-User — Used by ChatGPT when browsing the web in real time for users. Blocking it prevents ChatGPT from reading your pages during live conversations.
  • Google-Extended— Google's opt-out token for Bard and Vertex AI training. Blocking it does not affect standard Googlebot crawling or Search indexing.
  • anthropic-ai— Anthropic's crawler used for training Claude models.
  • CCBot— Common Crawl's bot, widely used as a training data source by many AI companies.

Use the Block AI Crawlers preset in the generator above to add these rules automatically.

Frequently Asked Questions

What is robots.txt?

robots.txt is a plain-text file placed at the root of your website that tells web crawlers and bots which pages or directories they are allowed or not allowed to access. It follows the Robots Exclusion Standard and is the first file most search engine bots check before crawling your site.

Where should robots.txt be placed?

robots.txt must be placed at the root of your domain — for example, https://yoursite.com/robots.txt. It cannot be placed in a subdirectory. Every domain and subdomain requires its own robots.txt file.

Does robots.txt affect SEO?

Yes. If you accidentally block important pages with Disallow rules, search engines will not crawl or index them, which removes them from search results. Conversely, using robots.txt to block low-value pages (admin panels, duplicate content) can focus crawl budget on your most valuable content.

Should I block my admin pages?

Yes. Blocking paths like /admin/, /dashboard/, and /wp-admin/ with Disallow rules prevents search engines from wasting crawl budget on those pages and reduces the chance of sensitive URLs appearing in search results.

What happens if I disallow everything?

If you set Disallow: / for all user agents, search engines will not crawl any page on your site. This will eventually remove your entire site from search results. Only use this setting intentionally, for example during site development before launch.

Can I block specific bots?

Yes. Use a specific User-agent name instead of the wildcard *. For example, User-agent: Googlebot followed by rules applies only to Google's crawlers. This lets you allow general crawling while restricting specific bots like scrapers or AI crawlers.

What is Crawl-delay?

Crawl-delay tells bots how many seconds to wait between requests to your server. For example, Crawl-delay: 10 means the bot waits 10 seconds between page fetches. This reduces server load on smaller sites. Note that Googlebot ignores Crawl-delay — use Google Search Console to control Googlebot's crawl rate instead.

How do I block AI bots?

To block AI training crawlers, add separate User-agent rules for GPTBot (OpenAI), ChatGPT-User (OpenAI browsing), Google-Extended (Google AI), CCBot (Common Crawl), and anthropic-ai (Anthropic). Use our Block AI Crawlers preset to add these rules automatically.

Does robots.txt guarantee privacy?

No. robots.txt is a public file and is not a security mechanism. It only requests that well-behaved bots comply — malicious crawlers may ignore it. Never rely on robots.txt to protect sensitive data; use authentication and server-side access controls instead.

How do I test my robots.txt?

Use the robots.txt Tester in Google Search Console (Search Console → Settings → robots.txt) to check whether specific URLs are allowed or blocked. You can also use the Fetch as Google tool to see how Googlebot views your pages.

Related Developer Tools