Should you block all AI crawlers in robots.txt?

No. Blocking all AI crawlers can reduce visibility in generated answers, conversational search and assisted journeys. The better approach is to separate training crawlers, search crawlers, user-triggered agents and truly sensitive areas of the website.

Does blocking GPTBot stop ChatGPT from citing my site?

Not necessarily. GPTBot concerns the potential use of content for training OpenAI models. For search and citations in ChatGPT, OpenAI also documents OAI-SearchBot. Blocking GPTBot can therefore reserve training without automatically blocking all visibility in ChatGPT Search.

Does Google-Extended block AI Overviews?

No. Google says Google-Extended does not affect inclusion or ranking in Google Search. Search AI features such as AI Overviews or AI Mode are controlled through the usual Google Search mechanisms, for example nosnippet, data-nosnippet, max-snippet or noindex.

What is the difference between blocking training and blocking AI visibility?

Blocking training aims to refuse that content be used to improve a model. Blocking AI visibility aims to prevent an engine or agent from reading and reusing the content in an answer. The two decisions have different effects: the first protects a use, the second can reduce discoverability.

Does robots.txt really protect sensitive content?

No. robots.txt expresses a preference to respectful crawlers, but it does not technically block an HTTP request. Sensitive content should be protected with authentication, access rights, network rules, noindex when relevant, WAF, rate limiting and strict environment separation.

Does Perplexity respect robots.txt?

Perplexity documents declared crawlers, but Cloudflare published 2025 observations of stealth crawling attributed to Perplexity despite robots.txt directives and WAF rules. For this type of crawler, robots.txt should be complemented by logs, CDN/WAF rules and traffic monitoring.

Can blocking AI crawlers have legal value in Europe?

Potentially, yes. In Europe, Directive 2019/790 allows rights holders to reserve some text and data mining uses, including through machine-readable means for online content. This decision should still be reviewed legally according to your specific situation.

Which AI crawlers should a B2B website block first?

For a B2B site that needs visibility, it is often better to keep useful search and citation crawlers, then selectively block training crawlers such as GPTBot, ClaudeBot, Google-Extended or Applebot-Extended if the company wants to reserve these uses.

Should I block ChatGPT-User or Claude-User?

Not by default. ChatGPT-User and Claude-User correspond to user-triggered access. Blocking them may prevent an AI-assisted prospect or customer from viewing your website. It is usually better to limit them on sensitive areas and keep them on useful public content.

How do you know whether an AI crawler policy works?

Check that robots.txt is reachable, directives are valid, then monitor server and CDN logs: user-agents, frequency, crawled pages, HTTP statuses and IPs. Then observe the effect in search engines, ChatGPT, Claude, Perplexity or the other AI answers you track.

SEO

Level: Advanced

Should you block AI crawlers? Protect content without disappearing

GPTBot, Google-Extended, ClaudeBot, Applebot-Extended, Perplexity: a decision method for blocking, allowing or monitoring AI crawlers without breaking search visibility.

09:38

AI crawler decision between visibility, training and content protection

AI crawlers · SEO · rights

Blocking AI crawlers is not a binary decision. You have to separate visibility, training, search, user-triggered actions and real content protection.

Allow Crawlers that support discovery, citations and useful user journeys.
Reserve Content you do not want used for model training or specific AI uses.
Limit Crawlers that are too frequent, opaque or bring no visible benefit to the site.
Protect Sensitive areas with real controls, not only with robots.txt.

References checked on June 21, 2026. This article is not legal advice: it helps frame an SEO, GEO, technical and editorial decision before changing a robots.txt file.

Short answer

Do not block "AI" as a whole. Decide which uses of your website you allow.

The weak decision is to add a few lines to robots.txt to "block ChatGPT", "block Gemini" or "block all AI crawlers" without separating use cases. The same company may use several crawlers: one for training, one for search, one for a user-triggered action, sometimes a robots.txt token that is not a crawler by itself.

The better decision starts with a sharper question: do you want to stay visible in search engines, AI answers and assisted journeys while refusing specific training or reuse of your content? In most cases, the answer is neither "open everything" nor "close everything".

Edikka position

Keep visibility open, reserve rights on sensitive content, and technically protect what should never be public.

Decision framework

Four different use cases hide behind the phrase "AI crawler".

Before touching robots.txt, classify each crawler by role. A training crawler does not have the same impact as a search crawler. A user-triggered agent does not have the same status as an automatic crawler. And a traditional search engine crawler can still be the necessary gateway to AI features embedded in that search engine.

Classic indexing

Googlebot, Bingbot or Applebot are primarily used to discover, index and rank pages. Blocking them often means reducing search visibility.

AI search

Some crawlers support generated answers, citations or conversational search. Blocking them can reduce presence in those answers.

Training

Other crawlers or tokens help control whether content may be used to improve models. This is often the most legitimate perimeter to reserve.

User action

Agents may visit a page because a user asked for research, comparison or an action. Blocking them may break an intentional user journey.

Reference table

The main AI crawlers and tokens to know before blocking.

This table summarizes documented or publicly declared behavior as of June 21, 2026. It must be reviewed regularly: crawler names, roles and access policies move quickly.

AI crawlers, documented role and likely effect of blocking
Crawler or token	Main role	Effect of blocking	Recommended decision
Googlebot	Google Search indexing and AI features integrated into Search.	Reduces or cuts Google’s access to your pages for Search, AI Overviews and AI Mode.	Do not block if Google visibility matters.
Bingbot	Bing Search indexing and Microsoft AI experiences that rely on Bing results.	Reduces or cuts Bing’s access to your pages for Search and Copilot answers grounded in Bing.	Do not block if Bing, Edge or Copilot visibility matters.
Google-Extended	robots.txt token to control some Gemini, Vertex AI and grounding uses outside Search.	Does not affect inclusion or ranking in Google Search, but may limit some Google AI uses outside Search.	Block if you want to reserve AI use without leaving Google Search.
GPTBot	Crawl that may be used to train OpenAI models.	Signals that content should not be used for generative model training by OpenAI.	Often the first selective block to consider, while keeping OAI-SearchBot if ChatGPT citation matters.
OAI-SearchBot	Automatic crawl related to search and citations in ChatGPT.	May reduce discovery, citation and presence in ChatGPT Search.	Keep it if you want GEO visibility in ChatGPT.
ChatGPT-User	Visits triggered by user actions or requests in ChatGPT and some GPTs.	May prevent a ChatGPT-assisted user from accessing content or completing an action.	Block only if agentic use of the site is undesirable.
ClaudeBot	Collection that may contribute to training Anthropic models.	Signals exclusion of future content from Anthropic training datasets.	Block if you reserve training uses.
Claude-SearchBot	Search crawler used to improve Claude search results and answers.	May reduce visibility and accuracy of your content in Claude search answers.	Decide according to your AI visibility strategy.
Claude-User	Web access requested by a Claude user.	May prevent Claude from retrieving your content for a user request.	Keep for useful public content; limit for sensitive areas.
Applebot	Discovery for Apple experiences such as Safari, Spotlight, Siri and Search.	May reduce discoverability across the Apple ecosystem.	Keep for strategic public pages.
Applebot-Extended	Use control for training Apple foundation models.	Does not stop Applebot from crawling; it refuses some training uses.	Block if you reserve Apple training uses.
PerplexityBot / Perplexity-User	Declared Perplexity crawlers for crawling and user access.	May reduce presence in Perplexity, but robots.txt alone may not always control observed traffic.	Manage with robots.txt, logs and network rules if the topic is sensitive.
CCBot, Bytespider, meta-externalagent, Amazonbot	Collection, indexing or AI-use crawlers depending on operators and periods.	Direct benefit is often less clear for a business website; impact must be checked in logs.	Monitor, document and block only if the risk-benefit ratio is unfavorable.

Tracked sources OpenAI crawlers Google AI features Google-Extended Microsoft Bing / Copilot Anthropic crawlers Applebot Cloudflare · Perplexity

The Google trap

Blocking Google-Extended does not remove you from AI Overviews.

This is the most counter-intuitive point. Google-Extended controls some uses related to Gemini, Vertex AI and grounding in Google systems other than Search. Google explicitly says this token does not affect inclusion or ranking in Google Search.

AI features integrated into Search, such as AI Overviews or AI Mode, rely on the usual Search controls. If you want to limit what Google can show in those experiences, Google points to nosnippet, data-nosnippet, max-snippet or noindex. These controls can also reduce the classic snippet or search visibility.

Key takeaway

You cannot cleanly opt out of AI Overviews without touching how Google can display your content in Search.

Europe

Blocking can also express a rights reservation.

In Europe, the topic is not only technical. Article 4 of Directive 2019/790 on copyright in the Digital Single Market creates an exception for text and data mining, but this exception applies only if the use has not been expressly reserved by rights holders, including through machine-readable means for online content.

The AI Act adds obligations for providers of general-purpose AI models, including a policy to comply with Union copyright law. The GPAI Code of Practice also includes a Copyright chapter designed to help providers demonstrate compliance. In practice, robots.txt can become an editorial and legal governance layer, not only an SEO tool.

Caution remains essential: Edikka is not a law firm. But a European website that publishes proprietary content, studies, document bases or editorial corpora should document explicitly what it allows and what it reserves.

Policy to document

The robots.txt decision should be linked to a content policy.

DSM Directive
Possible rights reservation against some text and data mining uses, including by machine-readable means.
AI Act
Obligations for GPAI model providers, including a policy to comply with Union copyright law.
GPAI Code
Copyright chapter proposing practical compliance measures for model providers.
Website
robots.txt, terms of use, logs, CDN and editorial governance should tell the same story.

This reading must be validated according to your activity, rights and target jurisdictions.

EU sources Directive 2019/790 · article 4 AI Act · article 53 GPAI Code of Practice

Technical limit

robots.txt is a declared preference, not a security wall.

A respectful crawler reads robots.txt and applies the instructions. An opportunistic scraper can ignore it, change IP address, change user-agent or route through third-party providers. Cloudflare documented stealth crawling behavior attributed to Perplexity despite blocking directives and WAF rules.

The conclusion is not that robots.txt should be abandoned. The conclusion is that it needs the right role. It expresses a preference, an access policy and sometimes a machine-readable rights reservation. It does not protect a confidential file, private PDF, internal endpoint, staging environment or back office.

What robots.txt can do

Give instructions to declared crawlers, document a preference, reduce some respectful crawls, express a machine-readable reservation.

What it cannot do

Block HTTP access, authenticate an agent, protect sensitive data, stop a hostile scraper or hide an already known URL.

What to add

Authentication, noindex, X-Robots-Tag, WAF, rate limiting, IP verification, logs, alerts and strict separation between public and private areas.

Edikka matrix

The right policy depends on the website’s business model.

An agency website, a media publisher, an ecommerce store and a SaaS product do not have the same interest in opening or closing content. Crawl policy should start from content value, visibility need, copy risk and the technical ability to control access for real.

Recommended AI crawler policy by website type
Website type	Main objective	Recommended policy	Mistake to avoid
B2B showcase site	Be found, understood, recommended and contacted.	Keep AI search crawlers, optionally reserve training, protect forms and endpoints.	Blocking all AI crawlers and losing useful citations.
Media or publisher	Preserve editorial value and negotiate content use.	Clear rights reservation, training blocks, premium policy, log monitoring and section-by-section decisions.	Leaving the whole corpus open by inertia.
Ecommerce	Stay visible in comparisons, prices, products and shopping assistants.	Open public product pages, control dynamic stock/prices, protect accounts, carts and checkout.	Blocking useful agents or exposing sensitive actions without safeguards.
SaaS	Make the offer, documentation and use cases understandable.	Open marketing and public docs, reserve proprietary content, authenticate the app and APIs.	Confusing public documentation with customer data.
Training or premium content	Sell access to structured knowledge.	Open excerpts, proof pages and outlines; reserve full modules and paid resources.	Putting paid content only behind an unlinked URL.
Intranet, staging, back office	Prevent unauthorized access.	Authentication, IP allowlist, noindex, network blocking, not only robots.txt.	Believing `Disallow: /` protects a private area.

robots.txt examples

Three typical configurations to adapt before publication.

These examples are starting points. They must be tested, documented and adapted to your objectives. User-agents change: always verify names in official documentation before going live.

01 · Open and measured

For a website that mainly wants visibility and temporarily accepts AI uses while monitoring logs.

User-agent: *
Allow: /

Sitemap: https://www.example.com/sitemap.xml

02 · Selective Edikka

To remain visible in search and useful answers while reserving the most obvious training uses.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: *
Disallow: /admin/
Disallow: /client/
Disallow: /private/

Sitemap: https://www.example.com/sitemap.xml

03 · Strong protection

For a media publisher, premium corpus or site that wants to strongly limit public AI use. Complete it with CDN, WAF and contractual rules.

# Keep Googlebot and Bingbot for Search, block the main declared AI crawlers.

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

User-agent: Claude-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: *
Disallow: /premium/
Disallow: /private-resources/

Sitemap: https://www.example.com/sitemap.xml

Method

Before blocking, audit what AI crawlers already see.

Map

Classify content by value and risk.

Separate public pages, conversion pages, studies, images, PDFs, documentation, paid resources, staging, back office and APIs. A single policy for the whole domain is rarely optimal.

Observe

Read logs before deciding.

Identify user-agents, frequency, touched pages, HTTP statuses, IPs, countries and traffic spikes. A crawler that never appears in logs does not always deserve a priority decision.

Decide

Choose by use case, not by fear.

Keep crawlers that support discovery and useful citation. Reserve training when content is strategic. Close areas that have no reason to be read by a public crawler.

Verify

Test the real effect after publication.

Check that the file is reachable, rules are syntactically valid, target crawlers read them, and visibility in Google, ChatGPT, Claude, Perplexity or Apple evolves as expected.

Internal path

Blocking or allowing crawlers only makes sense inside a wider strategy: be found, be cited, be understood by agents, and measure what actually happens. Read this page with the rest of the SEO and AI visibility cluster.

Conclusion

The best AI crawler policy is selective, dated and verified.

A site that blocks everything may protect itself, but it also becomes less visible in environments where users already ask AI systems to search, compare and recommend. A site that opens everything may gain exposure, but it lets content leave without a strategy.

The right level is between the two: open public pages that should be cited, reserve training when the content justifies it, close sensitive areas with real controls, then measure the effect in logs and AI answers.

Final decision

Do not block AI by reflex. Govern each use as a visibility, rights and security decision.

Edikka vision

AI crawler governance is becoming a normal layer of SEO strategy.

The question is no longer only "can we be crawled?". The real question is: which content should remain visible, which content should be reserved, and which content should be protected by more than a declaration?

At Edikka, an AI crawler policy is not a defensive reflex. It is a trade-off between acquisition, rights, trust and security: keep open what should support discovery, reserve what is an editorial asset, and technically protect what should never depend on a simple robots.txt file.

01 Visibility

Stay citable

Public pages carrying offers, proof and useful answers should remain accessible to the right engines.

02 Rights

Reserve sensitive uses

Proprietary content can justify an explicit reservation against some training uses.

03 Control

Verify real access

Logs, CDN and network rules often say more than the robots.txt file alone.

Article FAQ

Go further on this topic

Additional answers to clarify the key points covered in this article.

10 selected questions View all FAQs

Should you block AI crawlers? Protect content without disappearing

Do not block "AI" as a whole. Decide which uses of your website you allow.

Four different use cases hide behind the phrase "AI crawler".

Classic indexing

AI search

Training

User action

The main AI crawlers and tokens to know before blocking.

Blocking Google-Extended does not remove you from AI Overviews.

Blocking can also express a rights reservation.

The robots.txt decision should be linked to a content policy.

robots.txt is a declared preference, not a security wall.

The right policy depends on the website’s business model.

Three typical configurations to adapt before publication.

Before blocking, audit what AI crawlers already see.

Classify content by value and risk.

Read logs before deciding.

Choose by use case, not by fear.

Test the real effect after publication.

The best AI crawler policy is selective, dated and verified.

AI crawler governance is becoming a normal layer of SEO strategy.

Stay citable

Reserve sensitive uses

Verify real access

Go further on this topic

Web solutions designed to perform

AI agent-ready website : preparing your site for the ...

29 out of 30 hotels have no explicit AI rule - 2026 GEO ...

What is GEO ? Definition, method and examples for ...

Measure AI visibility : Search Console, citations and ...

Structured data for SEO and GEO : helping Google and ...

AI Overviews and AI Mode : why French companies must ...

Should you block AI crawlers? Protect content without disappearing

Do not block "AI" as a whole. Decide which uses of your website you allow.

Four different use cases hide behind the phrase "AI crawler".

Classic indexing

AI search

Training

User action

The main AI crawlers and tokens to know before blocking.

Blocking Google-Extended does not remove you from AI Overviews.

Blocking can also express a rights reservation.

The robots.txt decision should be linked to a content policy.

robots.txt is a declared preference, not a security wall.

The right policy depends on the website’s business model.

Three typical configurations to adapt before publication.

Before blocking, audit what AI crawlers already see.

Classify content by value and risk.

Read logs before deciding.

Choose by use case, not by fear.

Test the real effect after publication.

AI crawler governance completes SEO, GEO and agent-readiness.

Build a complete AI policy

The best AI crawler policy is selective, dated and verified.

AI crawler governance is becoming a normal layer of SEO strategy.

Stay citable

Reserve sensitive uses

Verify real access

Go further on this topic

Should you block all AI crawlers in robots.txt?

Does blocking GPTBot stop ChatGPT from citing my site?

Does Google-Extended block AI Overviews?

What is the difference between blocking training and blocking AI visibility?

Does robots.txt really protect sensitive content?

Does Perplexity respect robots.txt?

Can blocking AI crawlers have legal value in Europe?

Which AI crawlers should a B2B website block first?

Should I block ChatGPT-User or Claude-User?

How do you know whether an AI crawler policy works?

Web solutions designed to perform

AI agent-ready website : preparing your site for the ...

29 out of 30 hotels have no explicit AI rule - 2026 GEO ...

What is GEO ? Definition, method and examples for ...

Measure AI visibility : Search Console, citations and ...

Structured data for SEO and GEO : helping Google and ...

AI Overviews and AI Mode : why French companies must ...