dwc
dwc copied to clipboard
New Term - parentMeasurementID
New term : parentMeasurementID
- Submitter: Guillaume Body, Anne-Sophie Archambeau, Sophie Pamerlon
- Efficacy Justification (why is this term necessary?): Estimated records are a wide group of data that share similar information, mostly on statistical precision: confidence interval, standard deviation, distribution. These measurements are precision on other measurements (the main estimated value). To correctly describe this relation, the DwC standard needs to nest measurement within other measurements, such as events which can be nested in each other.
- Demand Justification (name at least two organizations that independently need this term): European Food Safety Authority (enetwild project), French Office of Biodiversity, potentially GEO BON (all essential biodiversity variables are statistically estimated)
Proposed attributes of the new term:
-
Term name (in lowerCamelCase for properties, UpperCamelCase for classes): parentMeasurementID
-
Organized in Class (e.g., Occurrence, Event, Location, Taxon): MeasurementOrFact
-
Definition of the term (normative): An identifier for the broader Measurement that groups this and potentially other Measurements or fact
-
Usage comments (recommendations regarding content, etc., not normative): Use a globally unique identifier for a dwc:MeasurementOrFact or an identifier for a dwc:MeasurementOrFact that is specific to the data set.
-
Examples (not normative): 9c752d22-b09a-11e8-96f8-529269fb1459 ; E1_E1_O1_M1
-
Note: for correct identification of the record, the basisOfRecord should include a new value: "statistical estimation"
This looks like a valuable generic way to extend MeasurementOrFacts.
The definition suggests ("group this and potentially other Measurements or fact" that the term might be used in ways than use case described in the Efficacy Justification (measurements of measurements). Do you envision other uses? And can you give examples?
2021-07-27 I retract the following opinion based on this commentary. - JRW
I am a bit concerned about the note. In implementation in Darwin Core Archives, the basisOfRecord term is only usable in Occurrence Core records, and has a recommended vocabulary. It does not seem as if there is a viable way to use basisOfRecord here, however, "statistical estimation" might be plausible as a part of the vocabulary used in dwc:measurementType. I say, "part of" because it would not be sufficient on its own, it would have to be "statistical estimation of something".
Are measurements that share a common parent all siblings?
This looks like a valuable generic way to extend MeasurementOrFacts.
The definition suggests ("group this and potentially other Measurements or fact" that the term might be used in ways than use case described in the Efficacy Justification (measurements of measurements). Do you envision other uses? And can you give examples?
I am a bit concerned about the note. In implementation in Darwin Core Archives, the basisOfRecord term is only usable in Occurrence Core records, and has a recommended vocabulary. It does not seem as if there is a viable way to use basisOfRecord here, however, "statistical estimation" might be plausible as a part of the vocabulary used in dwc:measurementType. I say, "part of" because it would not be sufficient on its own, it would have to be "statistical estimation of something".
Are measurements that share a common parent all siblings ?
This terms would allow to record siblings measurement of a parent one. For instance, one could record in a roe deer density estimation Event 1: the area of the study Occurrence 1: the species and the time period, and the basisOfRecord "statistical estimation" Measurement 1: measurementType = density ; measurementValue : 15 ; measurementUnit : individual per kilometer square Measurement 1-1 : measurementType = standard deviation ; measurementValue : 3.2 ; measurementUnit: individual per kilometer square Measurement 1-2 : measurementType = distribution ; measurementValue : gaussian Measurement 1-3 : measurementType = confidence interval ; measurementValue: 9|21 ; measurementUnit: individual per kilometer square Measurement 1-3-1 : measurementType = confidence level ; measurementValue : 95 ; measurementUnit : percentage
Measurement 1-1 ; 1-2 ; 1-3 are indeed sibling and describe the parent one, the density estimation per se. If you remove the measurement introduced by this new term, you get the current possibility of the Darwin Core.
The definition is very similar to the definition of parentEventID, and the use is indeed similar, except that it applies to measurement or fact instead of Event. In this dataset of density estimation, no human, nor machine has directly observed a roe deer. Those observartion would be found in the raw data dataset. Here, the "presence" of roe deer in only due to a statistical software running. It is even clearer if you think about a dataset based on "probability of presence", such as results of habitat suitability statistical procedure. It also allows to differenciate "expert knowledge" of density, which is "human observation" from statistical estimation, without changing the measurement Value: "density".
Thank you for this example @guillaumebody. Now that I see better what you are trying to do I retract my comment. The Occurrence records in the Occurrence extension can each bear a basisOfRecord, so the remaining issue would be to create a new class term proposal for something like StatisticalEstimation to accompany the existing types of Occurrence types (PreservedSpecimen, LivingSpecimen, FossilSpecimen, MachineObservation, HumanObservation, MaterialCitation).
The OBIS Secretariat and nodes have reviewed the proposal and while we do not have an immediate use case to apply it to, we can see it being a valuable addition to the MoF extensions. If ratified as a new term, OBIS will ensure it's added to the extended measurement or fact extension.
Hi all, I would like to bring to your attention https://github.com/gbif/rs.gbif.org/issues/103 which proposes to add dwc:relatedResourceID
(or rather dwc:resourceID
) to the ExtendedMeasurementOrFact extension. As @albenson-usgs pointed out, adding this term to the MeasurementOrFact extension would probably address the parent measurement issue discussed here as well.
Hi all, I would like to bring to your attention gbif/rs.gbif.org#103 which proposes to add
dwc:relatedResourceID
(or ratherdwc:resourceID
) to the ExtendedMeasurementOrFact extension. As @albenson-usgs pointed out, adding this term to the MeasurementOrFact extension would probably address the parent measurement issue discussed here as well.
Hi Pieter, This term would indeed technicaly do the job. In my view, yet, there is a clear difference between "relatedID", and "parentID".
The parentID (either Event, Occurence, Measurement, ...) is a clear indication of nested records, a "within" term. Through relatedResourceID, you can link very different information that share very different relationships. Merging both will univetably end up with confusion.
For instance, you could have estimations of population density throught 2 methods: method 1 giving 10 95IC 8-12 and method 2 giving 12 95IC 9-15.
MeasurementID | parentMeasurementID | relatedResourceID | measurementType | measurementValue | measurementUnit |
---|---|---|---|---|---|
uuid_1 | uuid_2 | density | 10 | individual per kilometer square | |
uuid_11 | uuid_1 | x_0.025 | 8 | individual per kilometer square | |
uuid_12 | uuid_1 | x_0.975 | 12 | individual per kilometer square | |
uuid_2 | uuid_1 | density | 12 | individual per kilometer square | |
uuid_21 | uuid_2 | x_0.025 | 9 | individual per kilometer square | |
uuid_22 | uuid_2 | x_0.975 | 15 | individual per kilometer square |
if needed, you can add a crossed relatedResourceID between uuid_1 and uuid_2 to indicate that they are the estimation of the same element, or a relatedResourceID to the graphique of probability density of each estimation without mixing it with the structuration of your data. Of course, a generic parentResourceID would work well in addition to a generic relatedRessourceID.