dwc
dwc copied to clipboard
Recommendations on missing/unknown/not recorded data in Darwin Core
This issue is inspired by Robert Mesibov's post in GBIF discourse - The vexed question of missing data in Darwin Core. The discussions on the thread and Arctos are very insightful. (Thank you!)
In the post, Bob mentioned:
The Darwin Core recommendations don’t provide a lot of guidance. The entry “unknown” is recommended when footprintSRS, geodeticDatum, verticalDatum or verbatimSRS isn’t known. On the other hand, the recommendation for coordinateUncertaintyInMeters is Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates).
Take the term geodeticDatum for example. unknown and not recorded are recommended in different sources.
From Darwin Core Quick Reference Guide
Recommended best practice is to use the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use the value
unknown.
From Georeferencing Best Practices
It is thus recommended to record the EPSG code of the coordinate reference system if possible, otherwise, record the EPSG code of the datum if possible, otherwise, record the EPSG code of the ellipsoid. If none of these can be determined from the coordinate source, record
"not recorded"
Subsequently these recommendations affect downstream implementation such as:
- https://github.com/tdwg/bdq/issues/60
- https://github.com/gbif/occurrence/issues/84
Hence I would appreciate if there will be a general guidelines on how to treat different scenario of NITS (Nothing Interesting To Say) in Darwin Core. I appreciate Bob's suggestion on how to treat missing data in his post:
Here’s a possible answer to the “What to do with missing data?” question, and it’s one I regularly propose to the compilers whose Darwin Core data tables I audit: If a data item is missing, leave it blank. If you have a reason for the "missingness’, put it in a …Remarks field.
Thanks a lot!
In the context of transcribing labels from specimens we also made a recommendation to break down unknown into...
- unknown
- unknown:undigitized
- unknown:missing
- unknown:indecipherable
- known:withheld
Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129, https://doi.org/10.1093/database/baz129
Here's a good summary from Data Carpentry about missing values as blanks:
https://datacarpentry.org/spreadsheet-ecology-lesson/02-common-mistakes/#null