dateparser icon indicating copy to clipboard operation
dateparser copied to clipboard

Parsing DMY, then try YMD - not possible

Open elgehelge opened this issue 6 years ago • 7 comments

It seems to be impossible to get the following behaviour: parse("01-02-03") == "datetime.datetime(2003, 2, 1, 0, 0)" parse("2003-02-01") == "datetime.datetime(2003, 2, 1, 0, 0)" The problem is that when ever the first line evaluates true, then the second line will produce a date where day and month is swapped (like so datetime.datetime(2003, 1, 2, 0, 0))

This is the preferred way of parsing dates in all northern european countries btw.

elgehelge avatar May 09 '18 07:05 elgehelge

I think the core of the problem is this behaviour:

>>> dateparser.parse("2003-02-01", settings={'DATE_ORDER': 'DMY'})
datetime.datetime(2003, 1, 2, 0, 0)

elgehelge avatar May 09 '18 07:05 elgehelge

Thank you for reporting that, @elgehelge, I have reproduced the problem. As a temporary workaround, you can use date_formats argument, as described in http://dateparser.readthedocs.io/en/latest/index.html#usage parse("2003-02-01", date_formats=["%Y-%m-%d"])

The choice of MDY format as default one has been discussed before. I'll bring back the core argument (backed by general observations, not data) that the default date settings on most web servers are still English (United States).

asadurski avatar May 09 '18 07:05 asadurski

The reason why I would like to use the dateparser library is because I don't know the date format. Your workaround will fail on the first example ("01-02-03").

elgehelge avatar May 09 '18 08:05 elgehelge

date_formats list can be extended, so: parse("01-02-03", date_formats=["%Y-%m-%d", "%d-%m-%y"]) But I understand that if you have a multitude of formats, a setting that doesn't work is a serious limitation.

asadurski avatar May 09 '18 09:05 asadurski

Let me just give you a little more insight into the problem. I figured out that pandas would actually work for my use case:

import pandas
parse = lambda string: pandas.to_datetime(string, dayfirst=True).to_pydatetime()

However, as it turns out, Sweden does their dates differently. They prefer YMD over DMY. Neither pandas or dateutil was abel to handle this use-case using dayfirst and yearfirst. So I guess what I really need is a monthinmiddle setting. Just phrased differently, the dayfirst and yearfirst is just a "broken" interface in my opinion, so don't try to go down that road.

elgehelge avatar May 09 '18 10:05 elgehelge

Any update on this issue? It seems major to me.

I need to mass parse UK dates, which are DMY (and there's no locale), but I can't if that breaks ISO dates. People with a date format set in their locale flat out can't read ISO dates without disabling the locale first:

>>> dateparser.parse('le 2000-01-02')
datetime.datetime(2000, 2, 1, 0, 0)

crusaderky avatar Sep 15 '20 10:09 crusaderky

I also have this issue and can confirm it is still present in 1.1.0. A monthmiddle setting seems to be much more relevant than specifying a date order in the many cases when you don't know the format. An alternative would be to accept a list of orders to try. Or another alternative is that if DATE_ORDER is specified, parsing should fail if that order cannot be parsed, so that the user of the library can know and implement their own fallback.

fish-face avatar Jan 16 '22 18:01 fish-face