dateparser
dateparser copied to clipboard
Parsing DMY, then try YMD - not possible
It seems to be impossible to get the following behaviour:
parse("01-02-03") == "datetime.datetime(2003, 2, 1, 0, 0)"
parse("2003-02-01") == "datetime.datetime(2003, 2, 1, 0, 0)"
The problem is that when ever the first line evaluates true, then the second line will produce a date where day and month is swapped (like so datetime.datetime(2003, 1, 2, 0, 0)
)
This is the preferred way of parsing dates in all northern european countries btw.
I think the core of the problem is this behaviour:
>>> dateparser.parse("2003-02-01", settings={'DATE_ORDER': 'DMY'})
datetime.datetime(2003, 1, 2, 0, 0)
Thank you for reporting that, @elgehelge, I have reproduced the problem.
As a temporary workaround, you can use date_formats
argument, as described in http://dateparser.readthedocs.io/en/latest/index.html#usage
parse("2003-02-01", date_formats=["%Y-%m-%d"])
The choice of MDY format as default one has been discussed before. I'll bring back the core argument (backed by general observations, not data) that the default date settings on most web servers are still English (United States).
The reason why I would like to use the dateparser library is because I don't know the date format. Your workaround will fail on the first example ("01-02-03").
date_formats
list can be extended, so:
parse("01-02-03", date_formats=["%Y-%m-%d", "%d-%m-%y"])
But I understand that if you have a multitude of formats, a setting that doesn't work is a serious limitation.
Let me just give you a little more insight into the problem. I figured out that pandas would actually work for my use case:
import pandas
parse = lambda string: pandas.to_datetime(string, dayfirst=True).to_pydatetime()
However, as it turns out, Sweden does their dates differently. They prefer YMD over DMY. Neither pandas or dateutil was abel to handle this use-case using dayfirst
and yearfirst
. So I guess what I really need is a monthinmiddle
setting. Just phrased differently, the dayfirst
and yearfirst
is just a "broken" interface in my opinion, so don't try to go down that road.
Any update on this issue? It seems major to me.
I need to mass parse UK dates, which are DMY (and there's no locale), but I can't if that breaks ISO dates. People with a date format set in their locale flat out can't read ISO dates without disabling the locale first:
>>> dateparser.parse('le 2000-01-02')
datetime.datetime(2000, 2, 1, 0, 0)
I also have this issue and can confirm it is still present in 1.1.0. A monthmiddle
setting seems to be much more relevant than specifying a date order in the many cases when you don't know the format. An alternative would be to accept a list of orders to try. Or another alternative is that if DATE_ORDER
is specified, parsing should fail if that order cannot be parsed, so that the user of the library can know and implement their own fallback.