CSV export of extracted dates not in ISO 8601
When exporting extracted data to CSV fields containing dates are not converted into ISO 8601.
The command
frictionless extract https://opendata.schleswig-holstein.de/dataset/12fb2027-d2d3-42c9-8774-34a70f584c0f/resource/602e974a-bed0-4ffc-b8a1-0f4744b23917/download/windkraftanlagen-2022-07-01.json
shows the correct dates in ISO 8601 format. However, when I export the data to CSV
frictionless extract --csv https://opendata.schleswig-holstein.de/dataset/12fb2027-d2d3-42c9-8774-34a70f584c0f/resource/602e974a-bed0-4ffc-b8a1-0f4744b23917/download/windkraftanlagen-2022-07-01.json
the unconverted dates are returned.
version 5.0.0b9
Hi @roll
The date cell value is parsed as datetime (ISO format) object and its default string representation is "ISO format" https://github.com/python/cpython/blob/3.8/Lib/datetime.py#L976 https://github.com/frictionlessdata/framework/blob/main/frictionless/fields/date.py#L47
In extract function we use 'to_list' to do field mapping, so for csv, json and yaml, it is working fine. But for default format,
'supported_types = None' here 'row.to_list()'
https://github.com/frictionlessdata/framework/blob/main/frictionless/helpers.py#L84
and type conversion doesn't take place and datetime object will be returned whose str representation is "ISO format" (as mentioned below) https://github.com/frictionlessdata/framework/blob/main/frictionless/table/row.py#L240
Soln:
So for default format passing 'types=[]' will trigger format check for the fields here in
https://github.com/frictionlessdata/framework/blob/main/frictionless/helpers.py#L84
data.append([cell if cell is not None else "" for cell in row.to_list(types=[])])
but I am not sure if that is the right solution so wanted your feedback before making changes. Thanks!
Hi @jze,
The difference between the two examples might be because of the typo in the schema - https://opendatarepo.lsh.uni-kiel.de/schema/windkraftanlagen.schema.json - format %d.%m.%y instead of %d.%m.%Y
In general, the framework outputs data compatible with the provided schema. So it prints e.g. 01.01.2018 because it's the format of this field.
I'll rename this issue to make it a feature request - we're going to review the mechanics behind this in v6 as we're getting closer to providing frictionless convert. Currently, it's an open question shall frictionless extract returns "normalized" data or not