dateparser
dateparser copied to clipboard
`dateparser.search.search_dates` does not work as expected
Given the below example
>>> from dateparser.search import search_dates
>>> search_dates("Validity date: 2021-11-26")
It does return None
, however it should return [('2021-11-26', datetime.datetime(2021, 11, 26, 0, 0))]
Confirmed with master
(141199b477cd1ae215207f4d84d0b9e0bae4bba9).
Hi @Gallaecio @angel-langdon,
I tried to debug this issue and found that there was a bug while detecting the "best_language" as the text was detected to be in the "vi" language instead of "en"
I will be trying to fix this same.
Forcefully returning detected language as "en" gives the desired result.
@Gallaecio What is the logic defined for language detection?
If the issue is the language, then this is a non-issue.
@angel-langdon You can either explicitly pass the language of the input text to search_dates
if you know it, or pass it a better language detection function.
@Gallaecio of course, there is an issue, in this case it does not really matter what language is detected, there is clearly a date in the text 2021-11-26
. You can close the issue, but the problem is still there....
@Gallaecio @angel-langdon I just figured out that search_dates is giving the desired result till "Validity date: 2021-11-12"
After this date, None is returned may be because the default date format is YYYY-DD-MM and we are passing the date in format YYYY-MM-DD.
data:image/s3,"s3://crabby-images/88223/882233470b938dd1901d21f83784a4c08ad06a89" alt="Screenshot 2021-10-05 at 1 44 29 PM"
Correct me if I am wrong with the default date format.