dateparser SECURITY: bad regex pattern in 'dateparser/languages/locale.py' will cause 'ReDos' security problem.

SECURITY: bad regex pattern in 'dateparser/languages/locale.py' will cause 'ReDos' security problem.

Open leveryd opened this issue 3 years ago • 4 comments

Problem description

i found one bad regex pattern in 'dateparser/languages/locale.py'

r'(?:[¡¿]+|[\.!?;…\r\n]+(?:\s|$))+',  # Spanish

those pattern will cause 'ReDos' security problem, proof of code like below

import re
p = re.compile(r'(?:[¡¿]+|[\.!?;…\r\n]+(?:\s|$))+')
re.findall(p, "?"*100000+"x")

run the above code, cpu utilization will be 100% in a very long period.

more detail about 'ReDos' please see owasp.

effect of this security problem

some api will call the pattern，like below

from dateparser.search import search_dates

search_dates("?"*100000+"x")

Jan 21 '21 02:01 leveryd

Hi @leveryd

I tried running %time search_dates("x"*100000+"x") (note that the first character is an "x") and I got:

CPU times: user 3min 27s, sys: 6.04 s, total: 3min 33s
Wall time: 3 min 38s

The CPU was at 99% :sweat_smile: but it didn't hang :stuck_out_tongue:.

Then I tried with your example, changing the first character to "?": %time search_dates("?"*100000+"x"). And this is what I found:

CPU times: user 9min 23s, sys 9.61 s, total: 9 min 32s
Wall time: 9min 37s

So it seems that it's definitely having some impact, but it's not so critical as I initially thought.

After this I introduced the mentioned regex to this reDoS checker: http://redos-checker.surge.sh/ and it wasn't detected as a vulnerable regex (though it could be wrong).

Before finishing doing my tests, I tried this: %time search_dates("?"*10000+"x") (4 zeroes) and I got:

Wall time: 4.54 s

and then I tried switching from "?" to "x" (%time search_dates("x"*10000+"x")) and I got this:

Wall time: 20 s

That's definitely suspicious, and I'm not sure if it's a vulnerable regex or not...

I would appreciate any help from people with more experience with reDoS.

Jan 27 '21 08:01 noviluni

regexploit also reports this as exponential:

Redos(starriness=11, prefix_sequence=SEQ{ [27:'] }, redos_sequence=SEQ{ [21:!,2026,0a,0d,2e:.,3b:;,3f:?]{1+}{1+} [27:'] }, repeated_character=[21:!,2026,0a,0d,2e:.,3b:;,3f:?], killer=None)
Worst-case complexity: 11 ⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐ (exponential)
Repeated character: [21:!,2026,0a,0d,2e:.,3b:;,3f:?]
Example: '\'' + '!' * 3456

btw: snyk reports this as vulnerability of this library: https://security.snyk.io/vuln/SNYK-PYTHON-DATEPARSER-1063229

Nov 21 '21 18:11 sebix

Unfortunately my organization's new security policies have bumped into that SNYK vulnerability where we have a couple of projects that use this library. I won't claim to be a regexpert, but it seems like a quick fix might be to limit the quantifiers to a range one could "reasonably" expect as a sentence separator. E.g., instead of:

(?:[¡¿]+|[\.!?;…\r\n]+(?:\s|$))+

Something like:

(?:[¡¿]{1,50}|[\.!?;…\r\n]{1,50}(?:\s|$))+

(I can't imagine anyone using more than 50 punctuation marks to end a sentence but just having that limit seems to cut the processing time significantly)

Would that be a satisfactory resolution to get the SNYK report closed?

Also, regexploit no longer seems to report the original regex as vulnerable, I'm not sure what changed there.

Jul 06 '22 20:07 Daveography

Any updates on that?

Aug 31 '22 11:08 macejiko

dateparser dateparser copied to clipboard

SECURITY: bad regex pattern in 'dateparser/languages/locale.py' will cause 'ReDos' security problem.

Problem description

effect of this security problem

dateparser
dateparser copied to clipboard

SECURITY: bad regex pattern in 'dateparser/languages/locale.py' will cause 'ReDos' security problem.