Cloudflare today announced a new feature that allows you signal via robots.txt whether your content can be used in Google’s AI Overviews (as well as for AI training).
- Cloudflare’s new Content Signals Policy is meant to give publishers more control over how crawlers and bots use their data, beyond traditional directives that only regulate crawling and indexing.
How it works. The policy adds three new machine-readable directives to robots.txt:
- search: permission for building a search index and showing links/snippets (traditional search).
- ai-input: permission to use content as input for AI-generated answers.
- ai-train: permission to use content for training AI models.
For example:
User-Agent: *
Content-Signal: search=yes, ai-train=no
Allow: /
Cloudflare will automatically add these directives for millions of customer sites that already use its managed robots.txt service.
Yes, but. Google has not committed to honoring these instructions.
- Cloudflare CEO Matthew Prince told The Information (subscription required) that Google was given a heads up about content signals, but has not said whether it will respect the new signals.
- Robots.txt directives are not legally binding, and Cloudflare acknowledged that some companies may ignore them.
Why we care. Will Google or other AI companies voluntarily comply? I doubt it. Still, this new option at least gives you a way to push back – a way to say “yes to search, no to AI Overviews,” a control that simply didn’t exist before. That matters because AI-generated answers have been widely criticized for eroding traffic and providing little to no value in return.
Bigger picture:
- Cloudflare says bots could exceed human traffic on the internet by 2029, raising the stakes for giving publishers tools to manage how their content is reused.
- The company has released its Content Signals Policy under a CC0 license to encourage adoption beyond its own customer base, hoping it becomes a broader industry standard.
- But Cloudflare also notes signals alone aren’t enough. Publishers who want stricter control should combine them with bot management and firewall rules.
Bottom line. Unless Google and others formally recognize and adhere to these instructions, publishers remain caught in a lose-lose situation: keep content open and risk misuse, or shut it down altogether.
Cloudflare’s announcement. Giving users choice with Cloudflare’s new Content Signals Policy
Search Engine Land is owned by Semrush. We remain committed to providing high-quality coverage of marketing topics. Unless otherwise noted, this page’s content was written by either an employee or a paid contractor of Semrush Inc.
Danny Goodwin is Editorial Director of Search Engine Land & Search Marketing Expo – SMX. He joined Search Engine Land in 2022 as Senior Editor. In addition to reporting on the latest search marketing news, he manages Search Engine Land’s SME (Subject Matter Expert) program. He also helps program U.S. SMX events. Goodwin has been editing and writing about the latest developments and trends in search and digital marketing since 2007. He previously was Executive Editor of Search Engine Journal (from 2017 to 2022), managing editor of Momentology (from 2014-2016) and editor of Search Engine Watch (from 2007 to 2014). He has spoken at many major search conferences and virtual events, and has been sourced for his expertise by a wide range of publications and podcasts.