safe-regex Heuristic #1 is very restricted and result in false negatives

Context: Heuristic#1: Star height > 1, dictates there should be no repetition inside of repetition.

Issue: The regex in question is /abcd(-[0-9a-z]{10,20}){2}/, which has repetition inside of repetition but it is not a vulnerable pattern because of fixed range quantifier.

Probable Improvements:

Make the Heuristic#1 configurable like the other Heuristic#2, which takes in options from the user and matches the start height to that particular config.
Usage of Range Quantifier as a factor for vulnerability.

Please share your thoughts on the above improvements and their feasibility. I will be happy to raise a PR for it. :)

Jul 19 '21 18:07 raghavgarg1257

@davisjam Can you please share your views on this?

Jul 30 '21 06:07 raghavgarg1257

Hi @raghavgarg1257. Thanks for your interest!

For a sound approach, see #17. I have some starter code I can share for this.

For a simpler improvement to the heuristic, I would be happy to give feedback on an approach that:

Preserves the existing behavior.
Can be configured to treat as safe any nested bounded repetition for which the total amount of ambiguity is "not too big", measured using the product of the size of the ranges involved. For example, ((a{5,10}){5,50}){0,12} would have a total amount of ambiguity of (10-5)(50-5)(12-0) = 2700.
Can be configured to treat as safe any nested bounded repetition for which each part has an upper bound -- as in your example. This would be a special case of the more general bounding strategy described in the previous bullet point.

Please @ me in any replies or PRs.

Jul 30 '21 20:07 davisjam

Hello @davisjam, Thanks for sharing your thoughts. :)

I will share an approach keeping the above points in mind and will try to create a poc for the same, before that I have a couple of doubts.

How can we define "not too big ambiguity", in your opinion? I mean coming up with a static upper limit might not be the best thing to do, right?
#17 is a PR related to the updation of Readme. I guess you meant to share something else.

Aug 02 '21 19:08 raghavgarg1257

Not too big ambiguity

Well, you can put an upper bound on the ambiguity (possibly a very loose upper bound) as I described above. Then the user can provide their desired bound. A little desktop benchmarking can be used to find a "recommended" value, e.g. 50 or 100 or etc.

#17

Oops, I meant #27.

Aug 04 '21 14:08 davisjam

safe-regex safe-regex copied to clipboard

Heuristic #1 is very restricted and result in false negatives

safe-regex
safe-regex copied to clipboard