We’re entering a new era for the Internet — driven by AI. This new era has started with AI bots — including website crawlers and scrapers — collecting more and more data to train AI models. Crawling and scraping aren’t new: Search engine companies have always crawled websites and scraped content to populate search results. That process has always benefited website owners, since search results drove traffic back to their sites.
But by using scraped content to train AI models, AI and search engine companies are changing the way users interact with content on the web. These AI models started generating derivative content that now appears as an overview above search engine results and as responses to queries within generative AI (GenAI) tools. Users increasingly trust this derivative content, and they often don't visit the original source website. This has become problematic for brands and content creators, especially media publishers, because less traffic to their website can impact their ability to promote subscriptions and advertising revenue.
At the same time, this increased trust in derivative content raises issues with data provenance, intellectual property, and content misuse: In short, content creators no longer have control over their content.
AI bots also present significant security and compliance risks for all organizations, across all industries. These bots can steal intellectual property, compromise web applications, and find vulnerabilities leading to security incidents or data breaches.
We have to address the security challenges of AI bots head on, and we have to do it now — because the threat will continue to grow. We, as cybersecurity leaders, need mechanisms for safeguarding our organizations from any and all harmful bots without restricting the opportunities of this new Internet era.
When I was pursuing a graduate degree in data science and machine learning, it became clear that AI companies would race to collect large amounts of high-quality data. The more high-quality data you collect, the better your model will be.
But the rapid rise in AI crawler activity in just the past year has been astonishing: Data from Cloudflare Radar shows that from July 2024 to July 2025, raw requests from GPTBot (which collects training data for ChatGPT) rose 147%. Over the same time period, raw requests from Meta-ExternalAgent (which helps train Meta’s AI models) rose 843%.
Meanwhile, websites continue to see activity from other types of AI bots as well. Malicious bots, for example, are not scraping content — they are scanning for web application vulnerabilities, breaking into user accounts, making fraudulent purchases, submitting spam through online forms, slowing site performance, and more.
A single malicious bot could have disastrous consequences for an organization. Imagine that you temporarily place the quarterly financial results of your publicly traded company on a staging website. You plan to publish those results only after the stock market closes for the day. But let’s say bots are able to access that information early and share this in a search query for users. These individuals could start trading your stock based on this material non-public information, leaving you exposed to regulatory fines and lawsuits.
Cybersecurity leaders must focus on stopping all bots that could harm their organization. But doing so is not so simple.
AI tools are making it easier for cybercriminals — and some AI companies — to create bots that evade traditional defenses. For example, cybercriminals can use AI to develop bots that can dodge controls like location or IP address blocking by changing the bot’s signature or attack vector. AI companies — and cybercriminals — can also create AI bots that mimic human behavior to defeat CAPTCHA challenges.
AI does not just help cybercriminals make “smarter” bots. It also enables them to launch bot invasions at unprecedented scale and speed, overwhelming existing defenses and controls.
To stop malicious AI bots, and control crawling and content scraping, organizations need a multi-layered security strategy. This strategy combines static controls with more predictive, dynamic capabilities and granular governance.
Static controls provide a foundation for the multi-layered strategy, blocking large-scale bot attacks as they occur and preventing AI-powered bots from evading traditional defenses. Static controls include:
CAPTCHA-free challenges that block bots without slowing down actual users.
Multi-factor authentication (MFA), which can stop automated bots from progressing past usernames and passwords.
Rate limiting, which can stop bot-based brute-force attacks, distributed denial-of-service (DDoS) attacks, and content scraping.
Redirecting bots to alternative content to slow down, confuse, and purposefully waste the resources of unwanted bots.
Building on that foundation, you can implement more predictive and dynamic controls that anticipate and detect bot threats before they do any damage. Predictive capabilities includes capabilities like:
Monitoring real-time threat intelligence feeds to identify emerging threats before they reach your organization.
Logging detailed site traffic to understand how both authentic users and bots typically behave on your site.
Detecting behavioral anomalies, using machine learning to establish baseline user behavior and identify deviations.
You could block all bots by default. But you might actually want to allow certain AI bots to scrape your site so your content appears in AI overviews or GenAI responses. To control which AI bots can interact with your site, you need a governance layer between the bots and your content. That layer requires multiple, interrelated capabilities:
AI auditing: Auditing capabilities provide clear visibility into which bots are accessing your website and how they are interacting with it.
Cryptographic verification: To help provide that visibility, bots can identify themselves by cryptographically signing the requests coming from their service. They can state their purpose and give you the option of whether to permit crawling.
Granular control over content: Granular control enables you to manage which bots can visit your site and which pages they can access. A publisher might block scrapers from pages where original content is monetized through ads. However, a tech company could allow bots to scrape developer documentation.
Pay per crawl: A pay per crawl capability would give you the option to charge the companies scraping your content. If a company is using your content to train their model, shouldn’t they have to pay you for that?
Cloudflare offers the cloud-based services needed to build a multi-layered AI bot strategy. With Cloudflare, you can implement static and dynamic controls, as well as granular controls for precise bot management.
AI audit capabilities enable teams to monitor and control how AI bots interact with website content. You can see which AI services are accessing your site; set policies for allowing or blocking crawlers and scrapers; and track which bots follow your directives. The pay per crawl feature within AI audit will also let you monetize AI bot access, requiring bot owners to pay you for crawling and scraping your site.
With these capabilities, Cloudflare — along with the world’s leading publishers and AI companies — is building a permission-based model for the Internet. More than solely blocking harmful bots, we’re creating a model that will benefit multiple organizations — including content creators as well as legitimate search engine and AI companies that are willing to pay for content that can train their AI models.
There’s no doubt that AI is changing the way the Internet operates. With Cloudflare, you can protect your organization from the risks that AI introduces while capitalizing on the opportunities of the new Internet era.
This article is part of a series on the latest trends and topics impacting today’s technology decision-makers.
Learn how to support the use of AI in the enterprise while maintaining security in Ensuring safe AI practices: A CISO’s guide on how to create a scalable AI strategy.
Grant Bourzikas — @grantbourzikas
Chief Security Officer, Cloudflare
After reading this article you will be able to understand:
Why activity from AI crawlers and other bots is rising
The security risks that an influx of AI crawlers presents
How to take control of bots using a multi-layered security strategy