Time limit for ignoracle to prevent accidental ReDoS
We've had a few accidental ReDoS incidents recently. Several times, ignore patterns made jobs noticeably slow. One job (this or last year?) took roughly a month to evaluate a bad ignore pattern on one URL. Now, two days ago, an ignore was added to job 67q6qla9panwsfvli1p8daore whose evaluation against the current URL would almost literally take forever – based on extrapolation from a few tests, I arrived at a rough estimate of 2e51 seconds, or about 48 orders of magnitude longer than the age of the universe...
There are three general approaches to fixing this:
- Detection: perform analysis on the pattern to determine whether it has pathological cases.
- Avoidance: switch to a regex implementation that isn't vulnerable to ReDoS, i.e. that doesn't have backtracking.
- Hammer: impose a time limit on the matching.
A quick search brought up one Python tool for detection, regexploit, but it doesn't support the full syntax of Python re expressions; in general, detection is likely imperfect, too. A backtracking-less regex implementation restricts it to true regular expressions in the original sense (for matching regular languages), so we'd lose things like lookarounds or backrefs that are frequently useful for edge cases. The easiest and most practical approach is to impose a time limit on the matching. Unfortunately, Python's re/sre engine does not have such a mechanism. There is a third-party package (regex), which is supposedly a superset of re and does have timeouts. Alternatively, the matching could be moved to a thread that can be killed without harm.