Draft Robots.txt Rules Without Blocking Public Pages
A practical robots.txt checklist for sitemap discovery, private routes, AI crawler sections, and avoiding accidental blocks on public tools.
Introduction
Robots.txt is a crawl access file, not a privacy layer. It helps crawlers discover sitemaps and avoid selected paths, but a public URL can still be visible elsewhere if it is linked, logged, or shared.
The Robots.txt Generator helps you draft a clear file for common launch patterns: public pages allowed, admin routes blocked, sitemap declared, and crawler-specific sections kept readable.
Real-world scenario
You are launching a tools site with public pages, account pages, admin pages, and an XML sitemap. A simple starting point might look like this:
User-agent: *
Allow: /
Disallow: /admin
Disallow: /dashboard
Disallow: /api
Sitemap: https://example.com/sitemap.xmlThe important part is not the number of rules. It is making sure public routes like /tools, /docs, and /blog are not caught by a broad disallow pattern.
What to check
Sitemap line. Include the final production sitemap URL.
Public paths. Confirm that tools, docs, blog, facts, and policy pages remain crawlable if they should be indexed.
Private or account paths. Block obvious admin, dashboard, checkout, and sign-in paths from crawling where appropriate.
Separate page-level indexing. Use robots meta for noindex decisions on individual pages instead of trying to solve everything in robots.txt.
Common mistakes
Blocking too broadly. A rule like Disallow: /tools can remove your main library from crawl access.
Using robots.txt for secrets. Do not put sensitive URLs in robots.txt as if that hides them.
Forgetting production hostnames. Sitemap URLs should use the canonical host, not localhost or staging.
Practical QA pass
Read the final robots.txt as a crawler would: from top to bottom, one directive per line, grouped by user agent. A file that looks readable in code but ships as one long line can be hard for tools and humans to inspect. Keep the production response plain and predictable.
Then test representative paths instead of only reviewing the file. Check that /tools, /blog, /docs, /facts, and /zh are not blocked if they are meant to be public. Also check that /admin, /dashboard, checkout, and sign-in paths are not being promoted through sitemap or internal marketing links.
Limits
Robots.txt guides compliant crawlers. It does not remove indexed URLs by itself, protect private data, or replace access control.
Next steps
- Robots.txt Generator — draft crawl rules and sitemap directives
- Meta Robots Tag Generator — prepare page-level indexing directives
- Sitemap URL Checker — review sitemap URLs before submission
- Canonical URL Generator — confirm preferred URLs for indexable pages
Final practical note
Before deploying robots.txt, check it as plain text. Each directive should be on its own line, and the sitemap should be visible without relying on JavaScript.