dateparser
dateparser copied to clipboard
Prevent ReDoS in Spanish sentence splitting regex
In Spanish, questions start with an upside down question mark:
¿Vos bueno?
This was already handled in the original regex, but the original regex was vulnerable for regular expression denial of service (ReDoS). In the new regex, we either search for normal end-of-sentence optionally followed by a ¿ or ¡, or a ¿ or ¡ on its own. A change is that the normal end-of-sentence (.!?;…) has to come before the ¡ or ¿, but I think this is acceptable.
This PR also adds some Spanish test cases. These hit the sentence splitting logic, but the exact result of the splitting is not tested.
Fixes #869