Developer Toolkit IconDeveloper Toolkit
All Articles

The Regex That Took Down Cloudflare. What ReDoS Is and How to Write Regular Expressions That Do Not Kill Your Server

A case study on catastrophic backtracking and the 2019 Cloudflare outage, used as a lens to teach developers how to write safe, performant regular expressions.

~5 min read
The Regex That Took Down Cloudflare. What ReDoS Is and How to Write Regular Expressions That Do Not Kill Your Server

July 2, 2019. Cloudflare's traffic dropped by roughly 80 percent. Not from a DDoS attack. Not from a bad deploy. From a single regular expression deployed in a new WAF rule.

The pattern contained nested quantifiers. When the regex engine tried to match it against crafted inputs, it couldn't stop backtracking. CPU hit 100 percent across every machine in every Cloudflare data center running the affected rule. For 27 minutes, a significant portion of internet traffic either slowed to a crawl or stopped entirely.

This attack has a name: ReDoS, Regular Expression Denial of Service. Most developers who write regex for validation have never heard of it. The ones who run validation on untrusted input without understanding it are carrying the same risk Cloudflare ran that morning.

How Backtracking Works

Regular expression engines process patterns by attempting to match them against input. When an attempt fails at a given position, the engine backtracks: it tries a different path through the pattern. For most patterns on most inputs, this is fast. A few combinations, a quick result.

Catastrophic backtracking happens when the number of paths the engine must try grows exponentially with input length. It doesn't time out. It doesn't give up early. It exhausts every combination before returning a result.

The canonical example:

(a+)+$

Against the input "aaaaab", this fails to match because the trailing 'b' can't satisfy $. Before the engine determines that, it tries every way to partition the 'a' characters between the outer + and the inner a+. One outer iteration consuming all five 'a' characters, then two outer iterations distributing them differently, then three, then four. For an input of length n, combinations grow roughly as 2^n.

A 25-character input of 'a' followed by 'b' can take seconds. A 40-character input might take hours. A server processing user input is handed exactly this kind of string, and the regex engine does the rest.

The Cloudflare Incident

The WAF rule Cloudflare deployed contained a pattern with multiple quantified alternations nested inside a quantified group. The postmortem they published was unusually detailed. The pattern was designed to detect HTTP protocol anomalies, passed functional tests against normal inputs, and was reviewed for correctness.

Nobody tested it against inputs designed to trigger maximum backtracking.

Cloudflare's workers run in a V8-based environment. V8's regex engine backtracks. A pattern that takes milliseconds on normal inputs can run indefinitely against an adversarial one. When that pattern runs simultaneously across every worker in every data center, the result is a global CPU spike that takes 27 minutes to remediate.

The attacker, in this case, doesn't need to know the exact pattern. They need to know roughly what structure triggers backtracking and send inputs long enough to stress it. That knowledge is public and has been for decades.

The Two Dangerous Structures

Nested quantifiers. A quantifier inside another quantified group creates the combinatorial explosion. (a+)+, (a|aa)+, (.+)+. The outer quantifier can distribute characters across the inner quantifier in exponentially many ways. When the overall match fails, every distribution gets tried.

Ambiguous alternation. When two alternatives inside a quantified group can match the same characters, the engine tries both when one path fails. (a|ab)+c against "aaab" with no 'c' will explore both alternatives at each position before giving up. The branching factor compounds across the length of the input.

The fix for both is to eliminate the ambiguity, not just add anchors. Anchors reduce starting positions. They don't reduce backtracking depth. The real solution is restructuring the pattern so there's only one way to match any given input.

Writing Patterns That Stay Fast

Prefer specific character classes over . inside quantifiers. Instead of .+, write [^<]+ or [^\n]+ or whatever character set the valid input actually contains. A tighter character class reduces the set of inputs that trigger deep backtracking.

Avoid nested quantifiers. If your pattern uses (group)+, ask whether the outer + is doing work that differs from a single quantifier on the group's contents. (a+)+ almost always simplifies to a+ with no change to behavior on valid inputs and no backtracking risk on invalid ones.

Use possessive quantifiers or atomic groups where supported. A possessive quantifier doesn't give back characters once claimed. An atomic group won't re-enter once it has matched. Java, PHP, and PCRE support both. JavaScript doesn't support either natively.

For server-side Node.js validation on untrusted input, the re2 npm package wraps Google's RE2 engine, which guarantees linear-time matching by not supporting backreferences. If your validation patterns don't need backreferences, RE2 is the safer default for anything that touches user input at scale.

Testing Before Deploying

Cloudflare's gap wasn't writing a bad pattern. It was not testing the pattern against adversarial inputs. A functional test suite that only checks whether the pattern matches what it should will never catch this class of problem.

The test you're missing: does the pattern complete quickly when given a long input designed to trigger maximum backtracking? For a validation pattern, craft a string that looks like a partial match but isn't, at 30 to 40 characters. If that test case takes more than a fraction of a second, the pattern has a problem.

The Regex Tester lets you run a pattern against multiple inputs at once and see results in real time. When writing any validation pattern that will run on user-supplied input, add at least one adversarial case: a long string of the characters your quantifiers target, followed by one character that can't match. If the tester slows noticeably, simplify the pattern before shipping.

The 27-minute outage followed a regex that worked correctly in every test that was actually run. The adversarial case just wasn't in the suite.

Free Developer Tools

Put the knowledge to work.

40 browser-based tools. No account. No data sent to a server.