learn-regex
learn-regex copied to clipboard
No mention of catastrophic backtracking
Regular expressions can be vulnerable to Regular Expression Denial of Service (ReDoS). Snyk.io has a good writeup, and the .NET docs have a thorough treatment as well (1, 2).
Catastrophic backtracking affects nearly every major language, including perl, ruby, java, javascript, python, C#, C++-11, and PHP. A guide to regular expressions is incomplete without a warning about ReDoS.
hi @davisjam,
I know this comment is too late from the time you created this issue. I am very happy when i know in this repository there is a person who aware of this vulnerability of Regular Expression Denial of Service (ReDoS).
I have a question that need your help. This is a simple regex: ^[01]([-]*[01]+)*$
And this is a simple sample can cause Catastrophic backtracking: 1010101010101001010&
I tried many ways to convert that regex to another form to gain the same meaning of the origin one but can avoid Catastrophic backtracking. Unfortunately, i can't reach.
Please help me with this.
I am looking to hearing response from you and everyone.
Thanks a lot,
@ziishaned I can prepare a PR with a small section about backtracking if you would be interested. Let me know.
@hoangnam2261
To avoid the backtracking, you need to give the regex engine clear boundary points that it won't backtrack past. For example, this regex is vulnerable: (a+)+$
, but this one is safe: (ba+)+
(because every time the regex engine sees a "b", it knows it can't re-use the a's before it in another path).
In your example, the problematic piece is (-*[01]+)*$
. Let's see if we can repair it. Observe that the backtracking occurs because the regex engine isn't sure whether it should treat a sequence of [01]
's as "A group of [01]s (not) preceded by a -" or as "Multiple clauses of such groups" (sorry, that probably didn't make sense).
Since what you want is a sequence of [01]
delimited by dashes, possibly with 0 dashes, would this regex work instead?
^[01]+([-]+[01]+)*$
Note I added a + to the first group (since it is always required) and I made the [-]
non-optional.
Thank you so much @davisjam, After reading your comment carefully, my mind become clearer.