dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

`dateparser.search.search_dates` does not work as expected

Open angel-langdon opened this issue 2 years ago • 6 comments

Given the below example

>>> from dateparser.search import search_dates
>>> search_dates("Validity date: 2021-11-26")

It does return None, however it should return [('2021-11-26', datetime.datetime(2021, 11, 26, 0, 0))]

angel-langdon avatar Oct 01 '21 09:10 angel-langdon

Confirmed with master (141199b477cd1ae215207f4d84d0b9e0bae4bba9).

Gallaecio avatar Oct 02 '21 06:10 Gallaecio

Hi @Gallaecio @angel-langdon, I tried to debug this issue and found that there was a bug while detecting the "best_language" as the text was detected to be in the "vi" language instead of "en" Screenshot 2021-10-05 at 1 23 50 AM

I will be trying to fix this same.

dishantsethi avatar Oct 04 '21 20:10 dishantsethi

Forcefully returning detected language as "en" gives the desired result. Screenshot 2021-10-05 at 1 25 04 AM

@Gallaecio What is the logic defined for language detection?

dishantsethi avatar Oct 04 '21 20:10 dishantsethi

If the issue is the language, then this is a non-issue.

@angel-langdon You can either explicitly pass the language of the input text to search_dates if you know it, or pass it a better language detection function.

Gallaecio avatar Oct 05 '21 06:10 Gallaecio

@Gallaecio of course, there is an issue, in this case it does not really matter what language is detected, there is clearly a date in the text 2021-11-26. You can close the issue, but the problem is still there....

angel-langdon avatar Oct 05 '21 07:10 angel-langdon

@Gallaecio @angel-langdon I just figured out that search_dates is giving the desired result till "Validity date: 2021-11-12" After this date, None is returned may be because the default date format is YYYY-DD-MM and we are passing the date in format YYYY-MM-DD.

Screenshot 2021-10-05 at 1 44 29 PM

Correct me if I am wrong with the default date format.

dishantsethi avatar Oct 05 '21 08:10 dishantsethi