orange3 icon indicating copy to clipboard operation
orange3 copied to clipboard

CSV File Import mangles dates - locale?

Open shakeshuck opened this issue 2 years ago • 2 comments

What's wrong? Importing a csv file with UK dates and leaving the type to 'auto' results in some dates being converted to American format (those where the date is less than 13) and others not. I could set the type to 'text' but then the column order is re-arranged, which is also not ideal.

How can we reproduce the problem? Import a column containing e.g. "28/03/18" and "08/03/18" "28/03/18" becomes 2018-03-28 "08/03/18" becomes 2018-08-03

What's your environment?

  • Operating system: Linux - OpenSUSE Tumbleweed
  • Orange version: 3.35
  • How you installed Orange: Orange3 installation was via pip

shakeshuck avatar Jul 07 '23 13:07 shakeshuck

In #6539, I try to address the issue, but even though I changed the implementation, the problem still persist because of how Pandas parses dates. Pandas try to guess the format of times in a column and then parse them with the same format. When Pandas cannot recognize the format, they fall back to dateutil implementation, and in this case, dates are still parsed separately, which can cause different parsing between dates in the same column. It happens in this case.

I suggest adding an option to specify datetime format (as we did in Edit Domain), but I would first wait for the File and CSV Import widgets to be joined and then implement this in one widget. What do you think, @janezd and @markotoplak?

Meanwhile, when datetimes are not parsed successfully, I suggest reading them as strings and converting them with the Edit Domain widget.

PrimozGodec avatar Aug 18 '23 10:08 PrimozGodec

Reopening since it is partially solved. As already discussed in #6539, there are two possible solutions:

  • Try rendering with all formats that we currently support, and then we fall back to default if None works (since date utils support more formats that we do). We would need to test how time-consuming it is.
  • An even better solution would be to allow users to specify the format (dropdown with more supported formats and maybe an option to input own format).

PrimozGodec avatar Aug 25 '23 08:08 PrimozGodec