dwc-qa
dwc-qa copied to clipboard
Linking related occurrence records, e.g. for stomach content data
Are there best practices to link related occurrence records?
For example I have a dataset resulting from a beam trawl with abundances of fish (say a 100 specimen of Cod in a trawl). For a number of these specimens, biometrics were recorded (Cod 1 = 15 cm, Cod 2 = 25 cm etc,… ). For some specimens, the stomach contents are analyzed for otoliths of different fish species. The dimensions of the otoliths are recorded.
Is this something for which you would use the terms associatedOccurrences, associatedOrganisms or the ResourceRelationship extention? The resourceRelationship extension seems most useful, but I don't think it's compatible with event core. Are there example datasets which use these terms, or other groups that handle stomach content data?
The issue has is relevant to relationships in general, and in that respect is similar to issue #77.
Easy one! Ha, just kidding.
I think a ResourceRelationship [1] has the most potential expressivity and explicitness. The other options are not really meant for this level of richness, not to mention that associatedOccurrences and associatedOrganisms are not part of the Event Core [2]. You are right though that the ResourceRelationship extension [3] takes an occurrenceID as its key. Though there is nothing in the IPT that prevents publishing an Event Core with an Occurrence extension [4] and a ResourceRelationship extension containing occurrenceIDs in the resourceID field (the IPT only gives a warning), consumers would expect the resourceIDs to contain eventIDs relating to the core in this context. So, you could publish the relationships that way, but your audience would have to be aware of what to do with them. One could describe this in the resource metadata.
Is there anything about the data that would prevent them from being modeled on an Occurrence Core? I have yet to see an "Event" data set that couldn't be (a not so hidden challenge). The only minor challenge is what to do with parentEventIDs, but that is easily surmountable with the ResourceRelationship extension also. With an Occurrence data set it would be straightforward to capture the relationships by giving the cod and their otoliths separate occurrenceIDs.
But the resource relationship is only part of the story presented. The rest has to do with the measurements. Under the Event Core, the measurements for the Occurrences could be captured using the trick of the Extended Measurement or Fact extension [5]. This too is non-standard in that it goes beyond the star schema limitation posed by the GBIF-accepted Darwin Core archive [6] and requires knowledge by the consumer of how to use it. Basically, it provides links to both the core Event and to an Occurrence, and allows one to propagate measurements for either one by including, or not, the occurrenceID in the extension.
In the Occurrence Core scenario described above, the measurements would be easy to capture, using the normal Measurement Or Fact extension [7], with the cod having their measurements and the otiliths as separate Occurrences having their measurements - again, relating the cod to their stomach content Occurrences with the ResourceRelationships.
[1] ResourceRelationship: http://rs.tdwg.org/dwc/terms/index.htm#relindex [2] Event Core: https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Event [3] ResourceRelationship extension: https://tools.gbif.org/dwca-validator/extension.do?id=dwc:ResourceRelationship [4] Occurrence extension: https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Occurrence [5] Extended Measurement or Fact extension: https://tools.gbif.org/dwca-validator/extension.do?id=http://rs.iobis.org/obis/terms/ExtendedMeasurementOrFact [6] Darwin Core Archive: https://github.com/gbif/ipt/wiki/DwCAHowToGuide [7] Measurement or Fact extension: https://tools.gbif.org/dwca-validator/extension.do?
Reply to:
Is there anything about the data that would prevent them from being modeled on an Occurrence Core?
To go all the way to the other side of the argument (and when we have already dipped the feet into the kidding mode): Is there anything that would prevent any Occurrence Core dataset to be modelled as an Event Core? Do we really need Occurrence Core at all? :-)
See also: https://www.gbif.no/news/2018/gbif_obis_event_core_workshop.html https://www.slideshare.net/DagEndresen/gbifobis-hackathon-in-brussels-2018-0116 https://bit.ly/gbifEu2018_new_datatypes
On 14:34, Tue, Jul 3, 2018 Dag Endresen [email protected] wrote:
Reply to:
Is there anything about the data that would prevent them from being modeled on an Occurrence Core?
To go all the way to the other side of the argument (and when we have already dipped the feet into the kidding mode): Is there anything that would prevent any Occurrence Core dataset to be modelled as an Event Core? Do we really need Occurrence Core at all? :-)
Certainly not, but given two possible options, why not use the one easier to produce and to consume?
See also: https://www.gbif.no/news/2018/gbif_obis_event_core_workshop.html
https://www.slideshare.net/DagEndresen/gbifobis-hackathon-in-brussels-2018-0116
https://www.slideshare.net/DagEndresen/event-core-and-new-datatypes-in-gbif-10th-european-gbif-nodes-meeting-in-tallinn-estonia-may-2018
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.
Certainly not, but given two possible options, why not use the one easier to produce and to consume?
Which is ... event core ;-) At least when publishing most other data types than museum collection specimens.
Please don't take this response as snarky, because I'm actually serious. The whole "star schema" thing, event core, occurrence core, associatedOccurrences, associatedOrganisms, and the ResourceRelationship class are all hacks because we are trying to handle a graph with spreadsheets. If we are really serious about integrating diverse kinds of data, then let's expend some effort and money creating a graph model that includes all of the "star schemas" that people care about and start dumping data into a graph database.
Because we have poured so much time and energy into the ontology-development sinkhole with nothing to show for it in terms of usable products, I think people are gun-shy about anything that reminds people of RDF. However, simply linking data according to a graph model and loading it into a graph database is neither complicated nor difficult. I linked every different kind of data I could find from GBIF and loaded it into Blazegraph a couple years ago and it it only took me about a week of my spare time to make it work. (Read about the experiment here and here.) Obviously, more time and effort would be required to scale up to the level at which GBIF operates, but at least in principle operating a graph database at that scale is totally possible.
A very important question that would need to be answered before embarking on the venture I've suggested is: what we would actually do with the integrated data? Simply integrating data based on varying "star schemas" for its own sake is not in itself a good enough reason to do it. What questions could we actually answer if we could achieve the holy grail of full data integration? This question is complicated by the fact that every "star" leaves out parts of the larger graph, so the resulting graph ends up with many potentially useful but missing bits of data that might be required to answer really interesting questions.