dateutil icon indicating copy to clipboard operation
dateutil copied to clipboard

Incorrect Parsing of dates with underscore delimiter

Open danhamill opened this issue 5 years ago • 5 comments

If I have the following list of strings:

a = ['Loc_RaffertytoLong_2004_02_21',
 'Loc_RaffertytoLong_2004_02_22',
 'Loc_RaffertytoLong_2004_02_23',
 'Loc_RaffertytoLong_2004_02_24',
 'Loc_RaffertytoLong_2004_02_26',
 'Loc_RaffertytoLong_2004_02_27',
 'Loc_RaffertytoLong_2004_02_28',
 'Loc_RaffertytoLong_2004_02_29']

And I try to parse the date using dateutil:

from dateutil import parse as dparse
for i in a:
    print(dparse.parse(i,fuzzy=True))

I get the printout:

2019-02-21 00:00:00
2019-02-22 00:00:00
2019-02-23 00:00:00
2019-02-24 00:00:00
2019-02-26 00:00:00
2019-02-27 00:00:00
2019-02-28 00:00:00

And the error:

ValueError: ('Unknown string format:', 'Loc_RaffertytoLong_2004_02_29')

From the printout, it is not correctly parsing the year from the string. It seems to be returning dates in the year 2019, which explains why it fails to parse a 29 Feb in 2019.

I can get the correct behavior is i replace all underscores with hashes.

for i in a:
    print(dparse.parse('-'.join(i.split('_')),fuzzy=True))

Is there a reason why dateutil doesn't recognize a underscore delimiter?

System information:

# Name                    Version                   Build  Channel
python-dateutil           2.8.0                    py36_0
Windows 10
python 3.6.8

danhamill avatar Aug 30 '19 18:08 danhamill

I'm not quite sure what's going on here, a minimal reproducing case would be:

from dateutil.parser import parse
from datetime import datetime

dt = parse("2004_09_17")
assert dt == datetime(2004, 9, 17), dt

When you use fuzzy_with_tokens, we get:

>>> parse('2004_02_04', fuzzy_with_tokens=True)
(datetime.datetime(2019, 2, 4, 0, 0), ('_', '_'))

This is one of a class of bugs where part of a datetime string gets eaten in some way - not included in the datetime or in the fuzzy tokens. Last time I saw this I think it was silently eating things it considered to be time zones, but we added a warning for that. @jbrockmendel any ideas?

pganssle avatar Aug 30 '19 19:08 pganssle

By the way @danhamill, if your datetimes are actually in a fixed format like that, I strongly recommend using datetime.strftime instead of dateutil.parser.parse. The dateutil parser is really best only for when parsing strings known to be datetimes but in an unknown format.

pganssle avatar Aug 30 '19 19:08 pganssle

Not obvious whats going on here; I'll take a closer look

jbrockmendel avatar Aug 30 '19 21:08 jbrockmendel

I have fixed the problem by adding "_" to the list of JUMP. It passes this test and all other tests. I still don't fully understand why it got eat away sometimes. I can tidy up and send a PR soon.

Cheukting avatar Oct 05 '19 16:10 Cheukting

I ran into the same issue today:

import dateutil.parser as dparser dt=dparser.parse("2019_01_26",fuzzy=True) print(dt)

-->2022-01-26 00:00:00

Installed version is python_dateutil-2.8.2-py2.py3-none-any.whl

cyberyu avatar Jul 03 '22 16:07 cyberyu