eslint-plugin-security icon indicating copy to clipboard operation
eslint-plugin-security copied to clipboard

Better unsafe regex detector

Open davisjam opened this issue 6 years ago • 35 comments

Hi all,

I'm a systems/security researcher at Virginia Tech and have been studying the incidence of vulnerable regexes in the wild.

This plugin's unsafe regex detector relies on safe-regex, which uses star height (nested quantifiers) to identify unsafe regexes.

Pros:

  1. safe-regex is fast.
  2. safe-regex is an npm module which makes it easy to work with.
  3. safe-regex has no non-JS dependencies.

As a result, safe-regex is great for CI use cases.

Cons:

  1. safe-regex is incorrectly implemented and substack is not maintaining it.
  2. safe-regex has lots of false positives (e.g. (ab+)+).
  3. safe-regex will only identify one type of exponential-time vulnerability, and ignores all polynomial-time vulnerabilities. In my research I found that, in the wild, polynomial-time vulnerabilities are far more common than exp-time vulnerabilities.

There are some alternatives to safe-regex that report exploit strings so you can tell if they're correct or not.

  1. Rathnayake's rxxr2. Like safe-regex, this only checks for star height-style vulnerabilities. But it doesn't have false positives as far as I can tell.
  2. Wustholz's REXPLOITER. This tests star height and other exp-time vulnerabilities, plus poly-time vulnerabilities.
  3. Weideman's RegexStaticAnalysis. Like Wustholz's REXPLOITER, but open-source and it works better.

Unfortunately:

  1. These alternatives all have non-JS dependencies (e.g. OCaml or Java) and have inconsistent interfaces.
  2. Some (especially Weideman) can take minutes to test a single regex.

My project vuln-regex-detector provides a convenient wrapper for these alternatives, and enforces time and memory limits to get results or fail relatively quickly.

However, I'd be surprised if developers were willing to wait even 30 seconds for linting. To address that, I'm nearly done implementing a server side so queries can be answered by hitting the server for a pre-computed answer instead of doing the expensive computation locally. The server processes not-seen-before queries in the background so subsequent queries will get a real answer.

Once that's done, would you folks be interested in hitting my server first and falling back to safe-regex if my server hasn't seen the query before? I've got a sample client that can be used with a one-line tweak for this use case.

davisjam avatar Mar 29 '18 18:03 davisjam