framework icon indicating copy to clipboard operation
framework copied to clipboard

CSV export of extracted dates not in ISO 8601

Open jze opened this issue 3 years ago • 1 comments
trafficstars

When exporting extracted data to CSV fields containing dates are not converted into ISO 8601.

The command

frictionless extract https://opendata.schleswig-holstein.de/dataset/12fb2027-d2d3-42c9-8774-34a70f584c0f/resource/602e974a-bed0-4ffc-b8a1-0f4744b23917/download/windkraftanlagen-2022-07-01.json

shows the correct dates in ISO 8601 format. However, when I export the data to CSV

frictionless extract --csv https://opendata.schleswig-holstein.de/dataset/12fb2027-d2d3-42c9-8774-34a70f584c0f/resource/602e974a-bed0-4ffc-b8a1-0f4744b23917/download/windkraftanlagen-2022-07-01.json 

the unconverted dates are returned.

version 5.0.0b9

jze avatar Oct 21 '22 09:10 jze

Hi @roll

The date cell value is parsed as datetime (ISO format) object and its default string representation is "ISO format" https://github.com/python/cpython/blob/3.8/Lib/datetime.py#L976 https://github.com/frictionlessdata/framework/blob/main/frictionless/fields/date.py#L47

In extract function we use 'to_list' to do field mapping, so for csv, json and yaml, it is working fine. But for default format, 'supported_types = None' here 'row.to_list()' https://github.com/frictionlessdata/framework/blob/main/frictionless/helpers.py#L84

and type conversion doesn't take place and datetime object will be returned whose str representation is "ISO format" (as mentioned below) https://github.com/frictionlessdata/framework/blob/main/frictionless/table/row.py#L240

Soln: So for default format passing 'types=[]' will trigger format check for the fields here in https://github.com/frictionlessdata/framework/blob/main/frictionless/helpers.py#L84

data.append([cell if cell is not None else "" for cell in row.to_list(types=[])])

but I am not sure if that is the right solution so wanted your feedback before making changes. Thanks!

shashigharti avatar Nov 17 '22 08:11 shashigharti

Hi @jze,

The difference between the two examples might be because of the typo in the schema - https://opendatarepo.lsh.uni-kiel.de/schema/windkraftanlagen.schema.json - format %d.%m.%y instead of %d.%m.%Y

In general, the framework outputs data compatible with the provided schema. So it prints e.g. 01.01.2018 because it's the format of this field.

I'll rename this issue to make it a feature request - we're going to review the mechanics behind this in v6 as we're getting closer to providing frictionless convert. Currently, it's an open question shall frictionless extract returns "normalized" data or not

roll avatar Jan 11 '23 14:01 roll