dateutil
dateutil copied to clipboard
Incorrect Parsing of dates with underscore delimiter
If I have the following list of strings:
a = ['Loc_RaffertytoLong_2004_02_21',
'Loc_RaffertytoLong_2004_02_22',
'Loc_RaffertytoLong_2004_02_23',
'Loc_RaffertytoLong_2004_02_24',
'Loc_RaffertytoLong_2004_02_26',
'Loc_RaffertytoLong_2004_02_27',
'Loc_RaffertytoLong_2004_02_28',
'Loc_RaffertytoLong_2004_02_29']
And I try to parse the date using dateutil
:
from dateutil import parse as dparse
for i in a:
print(dparse.parse(i,fuzzy=True))
I get the printout:
2019-02-21 00:00:00
2019-02-22 00:00:00
2019-02-23 00:00:00
2019-02-24 00:00:00
2019-02-26 00:00:00
2019-02-27 00:00:00
2019-02-28 00:00:00
And the error:
ValueError: ('Unknown string format:', 'Loc_RaffertytoLong_2004_02_29')
From the printout, it is not correctly parsing the year from the string. It seems to be returning dates in the year 2019, which explains why it fails to parse a 29 Feb in 2019.
I can get the correct behavior is i replace all underscores with hashes.
for i in a:
print(dparse.parse('-'.join(i.split('_')),fuzzy=True))
Is there a reason why dateutil
doesn't recognize a underscore delimiter?
System information:
# Name Version Build Channel
python-dateutil 2.8.0 py36_0
Windows 10
python 3.6.8
I'm not quite sure what's going on here, a minimal reproducing case would be:
from dateutil.parser import parse
from datetime import datetime
dt = parse("2004_09_17")
assert dt == datetime(2004, 9, 17), dt
When you use fuzzy_with_tokens
, we get:
>>> parse('2004_02_04', fuzzy_with_tokens=True)
(datetime.datetime(2019, 2, 4, 0, 0), ('_', '_'))
This is one of a class of bugs where part of a datetime string gets eaten in some way - not included in the datetime or in the fuzzy tokens. Last time I saw this I think it was silently eating things it considered to be time zones, but we added a warning for that. @jbrockmendel any ideas?
By the way @danhamill, if your datetimes are actually in a fixed format like that, I strongly recommend using datetime.strftime
instead of dateutil.parser.parse
. The dateutil parser is really best only for when parsing strings known to be datetimes but in an unknown format.
Not obvious whats going on here; I'll take a closer look
I have fixed the problem by adding "_" to the list of JUMP. It passes this test and all other tests. I still don't fully understand why it got eat away sometimes. I can tidy up and send a PR soon.
I ran into the same issue today:
import dateutil.parser as dparser dt=dparser.parse("2019_01_26",fuzzy=True) print(dt)
-->2022-01-26 00:00:00
Installed version is python_dateutil-2.8.2-py2.py3-none-any.whl