ipt icon indicating copy to clipboard operation
ipt copied to clipboard

Would it be possible to reformat dates to ISO standards?

Open ManonGros opened this issue 3 years ago • 5 comments

Many datasets are flagged with "Recorded date invalid" on GBIF because the dates provided don't follow the ISO standards. See this example: https://www.gbif.org/occurrence/1890646124

In the IPT people can choose the date format when uploading a spreadsheet or CSV file. Screenshot 2022-02-24 at 15 10 50

Would it be possible to reformat these dates to the ISO standards when the Archive is generated?

I thought the IPT already did that but when I tried to test it, it didn't seem to be the case. I think it would help a lot of publishers.

ManonGros avatar Feb 24 '22 14:02 ManonGros

@ManonGros Also Interpretation supports MDY, but you need to specify that manually by adding machine tag in registry: name: default_date_format value one of following: ISO, MDY or DMY

muttcg avatar Feb 24 '22 14:02 muttcg

It would be best if the IPT dealt with this to provide a good quality DWCA.

It can either convert the dates from the datasource-specified date format to ISO format, or it could use the dateFormat property of Darwin Core Archive: https://dwc.tdwg.org/text/#221-attributes -- though we currently ignore that attribute during interpretation; I think last time I looked no-one had ever used it!

MattBlissett avatar Mar 03 '22 13:03 MattBlissett

I think we can move the issue to pipelines project, we already have a mechanism for date formats, we can choose one of the preferable date formats on the interpretation side based on DWCA date format field

muttcg avatar Mar 03 '22 13:03 muttcg

I think it would be good to have it handled in the IPT (too). I don't think many publishers know about the dateFormat property of Darwin Core Archive (I have never seen it used either). It would help more people to have the conversion based on the datasource-specified date format in the IPT.

ManonGros avatar Mar 03 '22 13:03 ManonGros

Fixing it at source (in the IPT) means we have a better DWCA, that others can interpret without workarounds. I'd prefer that than doing this in pipelines.

MattBlissett avatar Mar 03 '22 13:03 MattBlissett