eslint-plugin-regexp icon indicating copy to clipboard operation
eslint-plugin-regexp copied to clipboard

Detect super-linear worst-case runtime

Open RunDevelopment opened this issue 3 years ago • 0 comments

I think we should add a rule that reports cases for exponential backtracking and polynomial backtracking.

However, there are two problems standing in our way:

Performance

Detecting exponential backtracking is computationally expensive.

Static analysis methods might only take a few milliseconds per regex but this will quickly add up for a large codebase (e.g. Prism has about 2k regexes). Fuzzers will usually take about one second or more per regex and are completely impractical for our purposes.

Solutions

  • Since the detection itself is deterministic, a simple cache might largely solve the performance problem.

    However, I am not aware of ESLint providing such a feature.

Detection

I am not aware of any existing detector implementation in JavaScript (time of writing: 2021-04-19).

safe-regex isn't practical. It implements a detection method that simply doesn't work. Star height has little to do with exponential runtime. I.e. (a|a)*b has a star height of 1 and the (a|a)* will cause exponential backtracking, and (ab+)*c has a star height of 2 without any form of exponential backtracking.

Solutions

  • Right now, we could use my library scslre to detect at least some polynomial and exponential backtracking quickly. scslre even offers fixes for simple cases.

  • There is also this method I implemented for Prism. This method is currently used in PrismJS and Highlight.JS. The upside of this method is that it is quick. It can analyze the >2.5k regexes of Prism in less than 5 seconds. The downside of this method is that there are false negatives and it cannot offer fixes. However, the positives it does find are mostly true positives.

    I should note that half of this method is already partially implemented by the no-dupe-disjunctions rule. If we find two alternatives that are not disjoint and both are children of a star quantifier, then we can reasonably assume that this will cause exponential backtracking (e.g. (a|a)*). Changing the no-dupe-disjunctions to add this information should be fairly easy.

  • Given that refa recently added ENFAs (not published yet), someone could probably implement a detection method like RXXR fairly easily.

  • I am currently writing my Bachelor thesis on a new static analysis method for exponential and polynomial backtracking. Part of this will be a JS library implementing my new detection method. We could also use this when it's ready (probably in 3-4 months or so).


Related:

  • nodesecurity/eslint-plugin-security#28
  • sindresorhus/eslint-plugin-unicorn#153

RunDevelopment avatar Apr 19 '21 18:04 RunDevelopment