AI Crawlers Are Breaking Gov Websites — What You Can Do About It

Video Description

Speaker: Matt West

Over the past year, a new kind of bot has shown up in our server logs—AI-powered crawlers scraping public government websites to feed large language models. These bots aren’t malicious, but they hit fast, often ignore robots.txt, and can overwhelm infrastructure not built to handle that level of traffic.

In this talk, we’ll look at what’s really happening behind the scenes: how LLM crawlers are targeting .gov sites, what patterns to watch for, and what happens when your site can’t keep up. We’ll cover practical steps you can take to detect, throttle, or block these crawlers using tools like CDNs, WAFs, and server-level protections. That includes how to configure your Drupal site to manage caching, permissions, and crawl behavior to reduce exposure and load.

We’ll also talk about the policy side—when blocking is appropriate, what public content should be exposed, and how to talk to stakeholders about the risks.

Whether you're a developer, architect, or program manager, you’ll leave with a clear understanding of the problem and a checklist of actions to protect your site without compromising on transparency.