dateparser
dateparser copied to clipboard
Timezone parser fails on hyphen-like characters
When hyphen-like characters are passed the parser ignores them, producing an incorrect output. An example of this in the wild would be on wikipedia, such as https://en.wikipedia.org/wiki/List_of_UTC_offsets, which uses the “−” U+2212 Minus Sign Unicode Character.
example = dateparser.parse("jan 15th UTC−06:00")
tzname = example.tzname()
> 2023-01-15 06:00:00+00:00
> UTC
I believe updating the regex to handle some or all of the "Unicode Dash Characters" under figure 6-3 at http://www.unicode.org/versions/latest/ch06.pdf would be quite beneficial. This table is available online on various websites as well if you'd rather not download the file.