framework
framework copied to clipboard
CSV export of extracted dates not in ISO 8601
When exporting extracted data to CSV fields containing dates are not converted into ISO 8601.
The command
frictionless extract https://opendata.schleswig-holstein.de/dataset/12fb2027-d2d3-42c9-8774-34a70f584c0f/resource/602e974a-bed0-4ffc-b8a1-0f4744b23917/download/windkraftanlagen-2022-07-01.json
shows the correct dates in ISO 8601 format. However, when I export the data to CSV
frictionless extract --csv https://opendata.schleswig-holstein.de/dataset/12fb2027-d2d3-42c9-8774-34a70f584c0f/resource/602e974a-bed0-4ffc-b8a1-0f4744b23917/download/windkraftanlagen-2022-07-01.json
the unconverted dates are returned.
version 5.0.0b9
Hi @roll
The date cell value is parsed as datetime (ISO format) object and its default string representation is "ISO format" https://github.com/python/cpython/blob/3.8/Lib/datetime.py#L976 https://github.com/frictionlessdata/framework/blob/main/frictionless/fields/date.py#L47
In extract function we use 'to_list' to do field mapping, so for csv, json and yaml, it is working fine. But for default format,
'supported_types = None' here 'row.to_list()'
https://github.com/frictionlessdata/framework/blob/main/frictionless/helpers.py#L84
and type conversion doesn't take place and datetime object will be returned whose str representation is "ISO format" (as mentioned below) https://github.com/frictionlessdata/framework/blob/main/frictionless/table/row.py#L240
Soln:
So for default format passing 'types=[]' will trigger format check for the fields here in
https://github.com/frictionlessdata/framework/blob/main/frictionless/helpers.py#L84
data.append([cell if cell is not None else "" for cell in row.to_list(types=[])])
but I am not sure if that is the right solution so wanted your feedback before making changes. Thanks!
Hi @jze,
The difference between the two examples might be because of the typo in the schema - https://opendatarepo.lsh.uni-kiel.de/schema/windkraftanlagen.schema.json - format %d.%m.%y instead of %d.%m.%Y
In general, the framework outputs data compatible with the provided schema. So it prints e.g. 01.01.2018 because it's the format of this field.
I'll rename this issue to make it a feature request - we're going to review the mechanics behind this in v6 as we're getting closer to providing frictionless convert. Currently, it's an open question shall frictionless extract returns "normalized" data or not