Future of requiring basisOfRecord and controlled vocabulary
A new recommendation has been put forward to modify basisOfRecord which (I believe) would result in changes to the controlled vocabulary currently in place for basisOfRecord in the IPT. In order to publish data via the IPT you are required to include basisOfRecord and the values in that field must be one of these exactly: PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event, HumanObservation, MachineObservation, Taxon, Occurrence, MaterialCitation (e.g. "Human Observation" will cause a dataset to be rejected).
Will the IPT continue to require basisOfRecord? Will the controlled vocabulary be updated to follow the suggested changes?
@timrobertson100
Assuming that recommendation does go through in the next edition of DwC:
Will the IPT continue to require basisOfRecord?
Yes, I would expect as the reason it was first made a required field remain valid (i.e. the need to have a basic understanding of what the record actually represents)
Will the controlled vocabulary be updated to follow the suggested changes?
Yes, although I don't imagine that this would impact things as much as one might think. I'd anticipate that:
-
On data publishing, the IPT would handle a spreadsheet containing
preserved_specimen,PRESERVED_SPECIMEN,preservedSpecimenetc and map that onto the correct term. -
When the IPT produces the output archive, the value it puts out for that field might change in format, slightly. I doubt many consumers would notice, as it's fairly common to consume this stuff in a case insensitive manner ignoring whitespace and
_characters.
Thanks Tim!
On data publishing, the IPT would handle a spreadsheet containing preserved_specimen, PRESERVED_SPECIMEN, preservedSpecimen etc and map that onto the correct term.
The problem is that this is not currently how the IPT works. If you have anything but "PreservedSpecimen" it will be rejected. Meaning you either have to update the file so the data in it matches that exactly or create a translation.
So will the IPT be updated to handle any of a number of input values?
(Note that this would be a welcome change from my perspective because having to explain that you have to put "PreservedSpecimen" exactly has always been a little fraught)
For thoroughness's sake I also tried "human_observation" with the same result:
Thanks for confirming that Abby. That seems like a nuisance, and something perhaps we should just fix for any vocabulary field on import (ignore whitespace, casing and _). Agree?
Yes!
just fix for any vocabulary field on import
I assume this conversion in the IPT would only be done for those fields that require a vocabulary? Or would this also be done for fields where a vocabulary is recommended?
Thanks @peterdesmet - yes, I think applying only to those controlled fields the IPT requires makes sense.
Basically, it's to stop the IPT failing the publication just because you wrote presErved_SpeciMen instead of preservedSpecimen (or whatever format is needed). That is just a nuisance that can be easily handled.