dwc icon indicating copy to clipboard operation
dwc copied to clipboard

Recommendations on missing/unknown/not recorded data in Darwin Core

Open ymgan opened this issue 2 years ago • 2 comments

This issue is inspired by Robert Mesibov's post in GBIF discourse - The vexed question of missing data in Darwin Core. The discussions on the thread and Arctos are very insightful. (Thank you!)

In the post, Bob mentioned:

The Darwin Core recommendations don’t provide a lot of guidance. The entry “unknown” is recommended when footprintSRS, geodeticDatum, verticalDatum or verbatimSRS isn’t known. On the other hand, the recommendation for coordinateUncertaintyInMeters is Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates).

Take the term geodeticDatum for example. unknown and not recorded are recommended in different sources.

From Darwin Core Quick Reference Guide

Recommended best practice is to use the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use the value unknown.

From Georeferencing Best Practices

It is thus recommended to record the EPSG code of the coordinate reference system if possible, otherwise, record the EPSG code of the datum if possible, otherwise, record the EPSG code of the ellipsoid. If none of these can be determined from the coordinate source, record "not recorded"

Subsequently these recommendations affect downstream implementation such as:

  • https://github.com/tdwg/bdq/issues/60
  • https://github.com/gbif/occurrence/issues/84

Hence I would appreciate if there will be a general guidelines on how to treat different scenario of NITS (Nothing Interesting To Say) in Darwin Core. I appreciate Bob's suggestion on how to treat missing data in his post:

Here’s a possible answer to the “What to do with missing data?” question, and it’s one I regularly propose to the compilers whose Darwin Core data tables I audit: If a data item is missing, leave it blank. If you have a reason for the "missingness’, put it in a …Remarks field.

Thanks a lot!

ymgan avatar Feb 27 '23 15:02 ymgan

In the context of transcribing labels from specimens we also made a recommendation to break down unknown into...

  • unknown
  • unknown:undigitized
  • unknown:missing
  • unknown:indecipherable
  • known:withheld

Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129, https://doi.org/10.1093/database/baz129

qgroom avatar Feb 27 '23 15:02 qgroom

Here's a good summary from Data Carpentry about missing values as blanks:

https://datacarpentry.org/spreadsheet-ecology-lesson/02-common-mistakes/#null

Mesibov avatar Mar 03 '23 22:03 Mesibov