quickjs icon indicating copy to clipboard operation
quickjs copied to clipboard

Fix excessive backtracking in regex engine by introducing backtrackig limit

Open renatahodovan opened this issue 1 year ago • 14 comments

The regex engine was prone to excessive backtracking, leading to timeouts and infinite loops, particularly with patterns involving nested quantifiers.

This commit introduces a backtracking counter and a limit of 1000 backtracking steps. When this limit is exceeded, the regex engine aborts to prevent excessive backtracking.

renatahodovan avatar May 29 '24 00:05 renatahodovan

The patch fixes a 4 years old issue reported by ClusterFuzz. This issue is possibly responsible (among others) for the poor fuzzing results.

renatahodovan avatar May 29 '24 00:05 renatahodovan

Minimal qjs repro of the infinite loop that tries to match the empty string non-greedily:

var r = new RegExp("()*?a");
r.exec(",");

Execute: ./qjs test.js

renatahodovan avatar May 29 '24 07:05 renatahodovan

I think it is better to fix the non greedy matching first. This patch only hides the real problem.

bellard avatar May 30 '24 12:05 bellard

The commit 36911f0d should fix the issue. Do you have other cases where the backtracking limitation workaround is necessary ?

bellard avatar May 30 '24 14:05 bellard

@bellard Thanks for the fix, it really fixed the recursion in the previous test case! However, I have another repro:

var r = new RegExp("(.+)+a");
r.exec("VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV");

renatahodovan avatar May 30 '24 15:05 renatahodovan

This example is not about infinite recursion. It is just demonstrate valid regexp behavior for given pattern, for which regexp execution time growing very fast with growing length of input string

You are right (I think), but I believe we also agree that such inputs are either useless or malicious. Therefore, if we limit the number of recursive calls they trigger, we won't be restricting the actual capabilities of the tool, but we can prevent malicious loops. (I killed the above code after 6 minutes, and who knows how long it would have run. And the input isn't even that long...)

renatahodovan avatar May 30 '24 17:05 renatahodovan

Oh, it seems that the comment I replied to has disappeared in the meantime.

renatahodovan avatar May 30 '24 17:05 renatahodovan

I see no problem with your example because it really requires a large number of backtracking (try it in another JS engine). It is the same problem as having a JS function which takes a long time to execute.

However, there is a real problem that should be addressed: the interrupt handler registered with JS_SetInterruptHandler() is not called during regexp execution which prevents the user from easily interrupting their execution.

bellard avatar May 30 '24 17:05 bellard

@bellard Is there any update on the correct method for registering an interrupt handler during regex execution? Alternatively, could the above patch serve as a temporary solution to address the frequent timeout issues during fuzzing, which are causing early exits and poor performance?

renatahodovan avatar Jun 11 '24 21:06 renatahodovan

@bellard: I think I found a solution: we can create a fuzzer-friendly build, as suggested by the libFuzzer documentation, using the FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION macro. This way, the backtracking limit will only be applied for fuzzing and will not affect the normal mode operation of the JS engine, while also preventing the fuzz session from continuously timing out.

renatahodovan avatar Jun 19 '24 22:06 renatahodovan

Gentle ping

renatahodovan avatar Jul 03 '24 18:07 renatahodovan

@saghul @bellard what do you think of the new fuzzer-specific approach?

renatahodovan avatar Jul 23 '24 18:07 renatahodovan

The change LGTM (perhaps the name could use some bikesheding) but I have no commit rights here :-)

saghul avatar Aug 07 '24 12:08 saghul

@saghul Thanks for the LGTM! The question about the name of the environment variable is an "easy" one, since it is a constant set by OSS-Fuzz and it is also the officially recommended name by LLVM for creating fuzzer friendly build from the target. :-)

renatahodovan avatar Aug 19 '24 08:08 renatahodovan

the issue was fixed by adding an interrupt handling in regexp.

bellard avatar Apr 05 '25 14:04 bellard