dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

Prevent ReDoS in Spanish sentence splitting regex

Open Sjord opened this issue 3 years ago • 0 comments

In Spanish, questions start with an upside down question mark:

¿Vos bueno?

This was already handled in the original regex, but the original regex was vulnerable for regular expression denial of service (ReDoS). In the new regex, we either search for normal end-of-sentence optionally followed by a ¿ or ¡, or a ¿ or ¡ on its own. A change is that the normal end-of-sentence (.!?;…) has to come before the ¡ or ¿, but I think this is acceptable.

This PR also adds some Spanish test cases. These hit the sentence splitting logic, but the exact result of the splitting is not tested.

Fixes #869

Sjord avatar Oct 12 '22 10:10 Sjord