dateutil
                                
                                
                                
                                    dateutil copied to clipboard
                            
                            
                            
                        Incorrect Parsing of dates with underscore delimiter
If I have the following list of strings:
a = ['Loc_RaffertytoLong_2004_02_21',
 'Loc_RaffertytoLong_2004_02_22',
 'Loc_RaffertytoLong_2004_02_23',
 'Loc_RaffertytoLong_2004_02_24',
 'Loc_RaffertytoLong_2004_02_26',
 'Loc_RaffertytoLong_2004_02_27',
 'Loc_RaffertytoLong_2004_02_28',
 'Loc_RaffertytoLong_2004_02_29']
And I try to parse the date using dateutil:
from dateutil import parse as dparse
for i in a:
    print(dparse.parse(i,fuzzy=True))
I get the printout:
2019-02-21 00:00:00
2019-02-22 00:00:00
2019-02-23 00:00:00
2019-02-24 00:00:00
2019-02-26 00:00:00
2019-02-27 00:00:00
2019-02-28 00:00:00
And the error:
ValueError: ('Unknown string format:', 'Loc_RaffertytoLong_2004_02_29')
From the printout, it is not correctly parsing the year from the string. It seems to be returning dates in the year 2019, which explains why it fails to parse a 29 Feb in 2019.
I can get the correct behavior is i replace all underscores with hashes.
for i in a:
    print(dparse.parse('-'.join(i.split('_')),fuzzy=True))
Is there a reason why dateutil doesn't recognize a underscore delimiter?
System information:
# Name                    Version                   Build  Channel
python-dateutil           2.8.0                    py36_0
Windows 10
python 3.6.8
                                    
                                    
                                    
                                
I'm not quite sure what's going on here, a minimal reproducing case would be:
from dateutil.parser import parse
from datetime import datetime
dt = parse("2004_09_17")
assert dt == datetime(2004, 9, 17), dt
When you use fuzzy_with_tokens, we get:
>>> parse('2004_02_04', fuzzy_with_tokens=True)
(datetime.datetime(2019, 2, 4, 0, 0), ('_', '_'))
This is one of a class of bugs where part of a datetime string gets eaten in some way - not included in the datetime or in the fuzzy tokens. Last time I saw this I think it was silently eating things it considered to be time zones, but we added a warning for that. @jbrockmendel any ideas?
By the way @danhamill, if your datetimes are actually in a fixed format like that, I strongly recommend using datetime.strftime instead of dateutil.parser.parse. The dateutil parser is really best only for when parsing strings known to be datetimes but in an unknown format.
Not obvious whats going on here; I'll take a closer look
I have fixed the problem by adding "_" to the list of JUMP. It passes this test and all other tests. I still don't fully understand why it got eat away sometimes. I can tidy up and send a PR soon.
I ran into the same issue today:
import dateutil.parser as dparser dt=dparser.parse("2019_01_26",fuzzy=True) print(dt)
-->2022-01-26 00:00:00
Installed version is python_dateutil-2.8.2-py2.py3-none-any.whl