learn-regex icon indicating copy to clipboard operation
learn-regex copied to clipboard

No mention of catastrophic backtracking

Open davisjam opened this issue 6 years ago • 4 comments

Regular expressions can be vulnerable to Regular Expression Denial of Service (ReDoS). Snyk.io has a good writeup, and the .NET docs have a thorough treatment as well (1, 2).

Catastrophic backtracking affects nearly every major language, including perl, ruby, java, javascript, python, C#, C++-11, and PHP. A guide to regular expressions is incomplete without a warning about ReDoS.

davisjam avatar Feb 09 '18 01:02 davisjam

hi @davisjam, I know this comment is too late from the time you created this issue. I am very happy when i know in this repository there is a person who aware of this vulnerability of Regular Expression Denial of Service (ReDoS). I have a question that need your help. This is a simple regex: ^[01]([-]*[01]+)*$ And this is a simple sample can cause Catastrophic backtracking: 1010101010101001010& I tried many ways to convert that regex to another form to gain the same meaning of the origin one but can avoid Catastrophic backtracking. Unfortunately, i can't reach. Please help me with this. I am looking to hearing response from you and everyone. Thanks a lot,

hoangnam2261 avatar Jul 18 '19 14:07 hoangnam2261

@ziishaned I can prepare a PR with a small section about backtracking if you would be interested. Let me know.

davisjam avatar Jul 18 '19 15:07 davisjam

@hoangnam2261

To avoid the backtracking, you need to give the regex engine clear boundary points that it won't backtrack past. For example, this regex is vulnerable: (a+)+$, but this one is safe: (ba+)+ (because every time the regex engine sees a "b", it knows it can't re-use the a's before it in another path).

In your example, the problematic piece is (-*[01]+)*$. Let's see if we can repair it. Observe that the backtracking occurs because the regex engine isn't sure whether it should treat a sequence of [01]'s as "A group of [01]s (not) preceded by a -" or as "Multiple clauses of such groups" (sorry, that probably didn't make sense).

Since what you want is a sequence of [01] delimited by dashes, possibly with 0 dashes, would this regex work instead?

^[01]+([-]+[01]+)*$

Note I added a + to the first group (since it is always required) and I made the [-] non-optional.

davisjam avatar Jul 18 '19 15:07 davisjam

Thank you so much @davisjam, After reading your comment carefully, my mind become clearer.

hoangnam2261 avatar Jul 18 '19 15:07 hoangnam2261