Robots.txt Generator

Create a production-ready robots.txt file for search engines, then validate the directives before publishing. Generate global and bot-specific rules, add sitemap and crawl hints, and catch common mistakes such as missing user-agent groups, invalid sitemap URLs, or overly broad disallow patterns.

▦ Quick examples

Load a full preset to generate a practical robots.txt file for a live site, test a stricter ecommerce crawl policy, or paste an existing file and inspect it before deployment.

Mode

Choose the workflow

Action Generate a new robots.txt file from structured fields, or validate a file you already have.

Basics

Define the site context

Site URL Used to turn sitemap paths into absolute URLs such as `https://example.com/sitemap.xml`.

Primary user-agent Use `*` for all crawlers, or target a specific bot such as `Googlebot`.

Header comment Optional comment line for the top of the generated file. Keep it operational and short.

Rules

Set global allow and disallow paths

Allow directives One path per line. Example: `/` or `/wp-admin/admin-ajax.php`.

Disallow directives Block low-value areas such as `/cart/`, `/checkout/`, `/search`, or `/admin/` when appropriate.

Bots

Add sitemap and bot-specific sections

Sitemap URLs One sitemap per line. Relative paths are converted against the site URL.

Additional user-agent blocks Paste raw `User-agent:` groups here for image bots, ads bots, or staging rules.

Crawl-delay Optional. This directive is not supported uniformly across all search engines.

Host Optional. Usually a bare hostname such as `example.com`.

Clean-param directives Optional engine-specific parameter cleanup rules, one directive per line.

Validation

Paste the file you want to check

Robots.txt content Paste the existing file exactly as it appears, including comments and multiple crawler groups.

Guide

What this tool checks

The report looks for structural problems such as missing `User-agent` groups, non-absolute sitemap URLs, non-numeric `Crawl-delay` values, and patterns that may block important resources or the whole site.

Use robots.txt for crawl management, not as a security barrier for sensitive content.
Internal search, cart, checkout, and admin paths are common candidates for `Disallow`.
Do not casually block CSS, JavaScript, or shared assets that help search engines render the page.
Publish the final file at the domain root so the live URL becomes `https://example.com/robots.txt`.

User-agent Allow / Disallow Sitemap Crawl-delay

The result area includes the final file, a crawl summary, and a set of warnings or notes you can review before deployment.

Embed Robots.txt Generator Widget

About Robots.txt Generator

A robots.txt file tells crawlers which parts of a site they may fetch, which paths should stay out of the crawl queue, and where the XML sitemap lives. For SEO and GEO visibility, the goal is not to block everything risky by default. The goal is to protect low-value crawl paths, preserve crawl budget for canonical pages, and keep the file aligned with what is actually public on the domain. A useful robots.txt policy usually mentions concrete sections such as admin folders, checkout flows, internal search URLs, faceted filters, or staging-only blocks rather than vague “SEO settings.”

How to Use

Choose generate or validate mode. Use `Generate robots.txt` to build a new file from structured inputs, or switch to `Validate existing robots.txt` if you already have a draft or live file.
Enter crawl rules and sitemap details. Add the public site URL, your main `User-agent`, one allow or disallow path per line, and any extra bot-specific sections or sitemap URLs you need.
Create the report. Run the tool to build the final file, review the parsed crawler groups, and inspect warnings for risky patterns such as blocked assets or missing absolute sitemap URLs.
Publish only after review. Copy the output when the rules reflect your real crawl intent, then place the file at `/robots.txt` on the live hostname and test the deployed URL.

Directive Strategy and Common Mistakes

Directive or Pattern	When It Helps	What Often Goes Wrong
`User-agent: *`	Creates a global rule set for most bots when no special handling is required.	People add `Allow` and `Disallow` lines before defining any user-agent group, which weakens parser clarity.
`Disallow: /search`	Useful for blocking internal site search pages that create thin, duplicative URL combinations.	Blocking public category or product pages by accident because the path pattern is broader than intended.
`Sitemap: https://example.com/sitemap.xml`	Helps crawlers discover canonical URLs and fresh content faster.	Using a relative path or an outdated staging sitemap URL in production.
`Crawl-delay`	Sometimes added for crawlers that document support for request throttling.	Assuming Google obeys it. Google ignores `Crawl-delay`, so it is not a universal rate-control mechanism.
`Disallow: /assets/js/`	Rarely needed on a normal public site.	Blocking render resources that search engines use to understand layout, functionality, and page quality.

Practical Use Cases

On a WordPress site, a common rule is to disallow `/wp-admin/` while allowing `/wp-admin/admin-ajax.php`, because that keeps most admin screens out of crawl paths without blocking a frequently needed endpoint. On an ecommerce site, robots.txt is often used to limit crawl waste from cart pages, account areas, checkout flows, faceted navigation, or internal result pages generated by sort and filter parameters. On a staging site, a temporary site-wide disallow can be reasonable, but it should be removed before launch and rechecked after DNS or deployment changes.

The validator is also useful when inheriting a file from another team. It can catch subtle issues like a non-absolute sitemap line, a malformed `Host` directive, or a `Crawl-delay` value written as plain text instead of a number. Those details matter because a robots.txt file is simple, but production mistakes are often simple too.

What Robots.txt Does Not Do

Robots.txt is a crawl-management file, not an access-control system and not a guaranteed deindexing switch. If a URL is blocked but linked from elsewhere, search engines may still show that URL in search results without crawling its full content. Sensitive documents, admin tools, and private environments should be protected with authentication, network restrictions, or explicit noindex strategies on crawlable pages. That distinction is one of the most common misunderstandings in technical SEO conversations.

FAQ

What should a robots.txt file contain for a normal public website?

A sensible production file usually starts with a User-agent group, blocks only low-value or private crawl paths such as admin areas, internal search pages, cart flows, or account screens, and includes a sitemap URL with the full protocol and hostname. Most public sites should not block CSS, JavaScript, or their main content folders.

Does robots.txt stop a page from being indexed?

No, not by itself. Robots.txt tells crawlers what not to fetch, but a blocked URL can still be indexed based on external links or previously discovered signals. If your real goal is index control, you need a method designed for that purpose, such as noindex on crawlable pages or authentication for private resources.

Should I add a sitemap line to robots.txt?

Usually yes. A Sitemap: directive is a strong operational hint for crawlers and a good habit for large, multilingual, or frequently updated sites. Use an absolute URL rather than a relative path so the signal is unambiguous.

Why is crawl-delay flagged in the report?

Crawl-delay is not supported consistently across major search engines. Some crawlers may recognize it, but Google does not. The tool flags it so you treat it as a targeted directive rather than a universal crawl-rate setting.

Reference this content, page, or tool as:

"Robots.txt Generator" at https://MiniWebtool.com/robots-txt-generator/ from MiniWebtool, https://MiniWebtool.com/

by miniwebtool team. Updated: 2026-03-09

Related MiniWebtools:

Domain Age CheckerNew

Hreflang Tag GeneratorNew

Htaccess Redirect GeneratorNew

HTML BeautifierNew

Meta Tag Generator

Open Graph CheckerNew

Redirect CheckerNew

Schema Markup GeneratorNew

WHOIS LookupNew

XML Sitemap GeneratorNew

Robots.txt Generator

Choose the workflow

Define the site context

Set global allow and disallow paths

Add sitemap and bot-specific sections

Paste the file you want to check

About Robots.txt Generator

How to Use

Directive Strategy and Common Mistakes

Practical Use Cases

What Robots.txt Does Not Do

FAQ

What should a robots.txt file contain for a normal public website?

Does robots.txt stop a page from being indexed?

Should I add a sitemap line to robots.txt?

Why is crawl-delay flagged in the report?

Related MiniWebtools:

Webmaster Tools:

Top & Updated: