dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

Strange parser error: search_dates parses "2010 Year" to a date with year of 4033

Open leeprevost opened this issue 2 years ago • 5 comments

Very strange issue.

dateparser.__version__
'1.1.8'


settings= {
 'RELATIVE_BASE': datetime.datetime(2023, 7, 31, 0, 0),
 'PREFER_DAY_OF_MONTH': 'first',
 'PREFER_DATES_FROM': 'future',
 'REQUIRE_PARTS': ['year', 'month'],
 'DATE_ORDER': 'YMD'
}
s = 'Closing Yield, 2010 Year Treasury notes On Dec 31, 2023'
search_dates(s, settings=settings)

Result: Out[27]:

[('2010 Year', datetime.datetime(4033, 7, 31, 0, 0)),
 ('On Dec 31, 2023', datetime.datetime(2023, 12, 31, 0, 0))]

(impossible year 4033 from the first part of the parse)

Also, put this question on SO *link:**

leeprevost avatar Oct 16 '23 21:10 leeprevost

This is because year is interpreted the same as years, and “2010 years” is interpreted as “2010 years later“.

Maybe we could make it so that if it is year, singular, it only works like that for “1 year”, and otherwise it gets translated to “year 2010” for example. But it may not be trivial to address.

Gallaecio avatar Oct 17 '23 17:10 Gallaecio

OK, thank you. I can work around this now that I know what the rules are. Could you point me to source so that I can see the ruleset? And is that user configurable?

leeprevost avatar Oct 17 '23 18:10 leeprevost

The code base is relatively complex, and I don’t think this case is user configurable at the moment.

Gallaecio avatar Oct 17 '23 18:10 Gallaecio

OK - I thought I saw a definitions page with the regex sequences it was using to parse. But, if not easy, I'll work around this. Want me to close this out?

leeprevost avatar Oct 17 '23 19:10 leeprevost

Want me to close this out?

No, I think this is a valid issue, and we want to eventually address it.

Gallaecio avatar Oct 18 '23 07:10 Gallaecio