dwc icon indicating copy to clipboard operation
dwc copied to clipboard

Change term - MaterialSample

Open Jegelewicz opened this issue 4 years ago • 139 comments

Change term

  • Submitter: @jegelewicz
  • Justification (why is this change necessary?): The definition of MaterialSample is essentially the same as that for PreservedSpecimen. Members of the Arctos Working Group feel that these two terms are currently interchangeable. See https://github.com/ArctosDB/arctos/issues/2432 for further discussion.

From https://dwc.tdwg.org/terms/#materialsample

MaterialSample info
Definition A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.
Examples A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.

From https://dwc.tdwg.org/terms/#livingspecimen

PreservedSpecimen info
Definition A specimen that has been preserved.
Comments  
Examples A plant on an herbarium sheet. A cataloged lot of fish in a jar.

Given the above, we propose that MaterialSample should be more specific to something less than what might be considered a "voucher" in order to delineate it from PreservedSpecimen.

  • Proponents (who needs this change): Arctos Working Group

Proposed new attributes of the term:

  • Term name (in lowerCamelCase): MaterialSample (no change)
  • Organized in Class (e.g. Location, Taxon):
  • Definition of the term: A physical result of a subsampling event. In biological collections, the material sample is typically collected as a subsample from a preserved or living organism, and either preserved or destructively processed. In geological and environmental collections the material sample is typically collected as a subsample of a larger geologic or environmental construct.
  • Usage comments (recommendations regarding content, etc.):
  • Examples: A part of an organism isolated for some purpose. A tissue sample. A soil sample. A marine microbial sample.
  • Refines (identifier of the broader term this term refines, if applicable): None
  • Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable): http://rs.tdwg.org/dwc/terms/version/MaterialSample-2018-09-06 (added by @tucotuco)
  • ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG, if applicable): DataSets/DataSet/Units/Unit (added by @tucotuco)

Note: all of the above is my interpretation of the Arctos Working Group conversation.

Jegelewicz avatar Jan 22 '21 14:01 Jegelewicz

I do not agree with this proposal. I think a better approach is to embrace MaterialSample as currently defined, and instead alter the various "BasisOfRecord" terms that are represented as "pseudo-classes" (my term) in DwC.

I do agree that the definition of MaterialSample does need refinement and clarification (especially with respect to the boundary between an instance of Organism and and instance of MaterialSample [I have thoughts on this], but I do not agree that the scope of MaterialSample be fundamentally altered to imply "sub"sampled material.

I feel strongly that the DwC class MaterialSample should retain its original definition in the broader sense, to represent the entire spectrum of what we used to refer to as "CollectionObjects" -- that is, inclusive of single whole-organism specimens, derivatives of such (as proposed above), and also aggregates of such (lots, soil/water samples with multiple taxa, rocks with multiple embedded fossils and/or freshly collected and encrusted with [recently] living organisms, etc.)

In summary, I agree with the need and justification for a change in DwC to reconcile these terms, but I think the main change should be in the terms PreservedSpecimen, FossilSpecimen, LivingSpecimen. Instead of representing these as distinct classes that are mutually exclusive with respect to each other and to MaterialSample, I think it makes much more sense to regard these three terms as, in effect, subclasses of MaterialSample (mutually-exclusive alternatives of each other, but all within the scope of MaterialSample), and likewise either deprecate the terms HumanObservation and MachineObservation (as discussed at the recent TDWG, the distinction between them is fuzzy at best), or treat them as subclasses of a general Observation class, which itself is mutually exclusive with respect to MaterialSample.

deepreef avatar Jan 23 '21 22:01 deepreef

Agree with @deepreef here. The DINA consortium is in the midst of modelling this and have come to the realization that a catalogued object (= Physical Specimen, Physical Entity) is an instance of a MaterialSample. It may be derived from other instances of MaterialSample (destructively or non-destructively) and may equally produce one or more instances of yet other MaterialSamples as strictly expressed here by @Jegelewicz. And to take this further, Occurrence terms like catalogNumber, otherCatalogNumbers, associatedSequences, and preparations would be better placed under MaterialSample because these have nothing to do with an Occurrence.

dshorthouse avatar Apr 22 '21 02:04 dshorthouse

@dshorthouse : 100% agreement on all of this. We likewise came to the exact same conclusions (including the move of catalogNumber, otherCatalogNumbers, associatedSequences and preparations from Occurrence to MaterialSample).

Obviously it must be true what they say: Great minds think alike. (Or, perhaps, feeble minds think alike? Probably both, and the challenge is figuring out which this represents...)

Incidentally, I would add to the list disposition - as this seems to be more of a property of the physical specimen than the Occurrence instance at which it was extracted from nature.

Another conundrum is how to apply individualCount. As defined, this is clearly a property of Occurrence, but we need a similar property to track number of "units" (for lack of a better term) comprising an instance of MaterialSample as well. The word "individual" harkens back to the old (now deprecated) individualID, which has been replaced by organismID in the Organism class. But we also have organismQuantity, which seems more specific to "The number of individuals represented present at the time of the Occurrence" ("A number or enumeration value for the quantity of organisms."). So... not sure if we can re-purpose individualCount to be something that applies to instances of MaterialSample in this context; or if we need some other way of tracking the "units" of particular instance of MaterialSample. Perhaps this is best handled via MeasurementOrFact instances? There are two separate issues (here and here) about this going on right now...

So many questions....

deepreef avatar Apr 22 '21 03:04 deepreef

And yet another conundrum. What is going on with preparations, especially if it were moved into MaterialSample where it probably belongs? Is it a noun, a verb or a gerund? The examples provided could be interpreted as instances of MaterialSample, aggregations of instances, methods employed to produce them, or descriptions of their preservation media or vessel(s). However, the expectation is a singular materialSampleID, which means we should be obliged to make sense of all the relationships among instances of MaterialSample that share a common provenance by using ResourceRelationship. Some of those relationships will be between MaterialSamples and some of those relationships will between be MaterialSamples and Occurrences, the latter comparable in spirit to what we do with basionyms and their relationship(s) to downstream taxon concepts. And, it's from that particular link that we uncover the collecting event details.

dshorthouse avatar Apr 22 '21 12:04 dshorthouse

@dustymc

Jegelewicz avatar Apr 22 '21 14:04 Jegelewicz

If we extend these realizations to their logical conclusion, we have a problem in how we expect our specimen-based data to be interpreted in the context of an Occurrence. Most (all?) of our aggregators make heavy use of occurrenceID as the canonical anchor for our physical objects that we in the museums community implicitly model as MaterialSample. For us, we're forced to equate occurrenceID and materialSampleID when we share data whereas they are not the same thing. An Occurrence speaks more to an ephemeral, epistemological origin (i.e. basisOfRecord) from which may be derived evidence of past existence manifested as MaterialSample.

The exchange networks of duplicates distributed among herbaria is a concrete example of this. One plant clipped into five pieces, prepared and mounted, each sheet then shipped 'round the world to 5 herbaria. In reality, that's one Occurrence and five MaterialSamples although the participant herbaria have no functional mechanism to produce & share precisely that same progenitor occurrenceID. What they have are their own catalogNumber(s) and vague signals like recordNumber that there was once a unitary Occurrence: one organism at a particular place at a particular time. At present, each herbarium independently creates and attaches a transcribed collecting event (globally plural, locally unique) then shares their data anchored to occurrenceID (globally plural, locally unique) and we solve the problem through yet more abstraction by deploying AI and crafting some clusters with fuzzy edges. But... we're still left with globally plural and locally unique occurrenceIDs for a unitary Occurrence in this example.

dshorthouse avatar Apr 22 '21 14:04 dshorthouse

the participant herbaria have no functional mechanism to produce & share precisely that same progenitor occurrenceID. What they have are their own catalogNumber(s) and vague signals like recordNumber that there was once an Occurrence.

This is a long-standing problem and not just for herbaria. Mammal occurrences end up at different institutions or collections when skins, skeletons and genetic material get separated over the years.

Jegelewicz avatar Apr 22 '21 14:04 Jegelewicz

See https://github.com/ArctosDB/arctos/issues/1966 for another side of the story

Jegelewicz avatar Apr 22 '21 14:04 Jegelewicz

I believe Arctos has all of the "pigeonholing problems" mentioned in this thread.

https://arctos.database.museum/guid/UAM:ES:4588 seems to meet some definitions of "FossilSpecimen" and PreservedSpecimen, and is also cataloged as https://arctos.database.museum/guid/UAM:Mamm:53942.

Many things in herbaria are "LivingSpecimen" pending a little water and sunlight.

catalogNumber and otherCatalogNumbers seem closer to Occurrence than MaterialSample to me, but we could easily map through one more denormalization. (We do have "MaterialSample otherCatalogNumbers" but I don't think they're exposed via DWC.)

https://arctos.database.museum/guid/MVZ:Egg:10460 is more or less another example of "rocks with multiple embedded fossils."

Observation class, which itself is mutually exclusive with respect to MaterialSample.

We have "there was never a physical part" and "someone says there were physical parts, but they are permanently unavailable for various reasons." I do not see much functional distinction.

So many questions....

Yep!

dustymc avatar Apr 22 '21 15:04 dustymc

This is a long-standing problem and not just for herbaria. Mammal occurrences end up at different institutions or collections when skins, skeletons and genetic material get separated over the years.

Nit-picky, but by "occurrence" here, you mean MaterialSample or specimen. Occurrences don't go anywhere. There may have been a single Occurrence - a single organism collected in a single event. But, the parts - the MaterialSamples - are now scattered among many homes. They all have a relationship to that original Occurrence (perhaps through a parent MaterialSample that no longer exists eg carved up in the basement of the Smithsonian from a previously documented MaterialSample) but there are barriers to knowing it, agreeing on it, using it, and then sharing it.

dshorthouse avatar Apr 22 '21 17:04 dshorthouse

Nit-picky, but correct.

Jegelewicz avatar Apr 22 '21 18:04 Jegelewicz

So eDNA are MaterialSamples and not Occurrences? Is it both? When is something not an occurrence? Because eDNA have associatedSequences and isn't all of this wrapped up in the occurrence core anyway? So what does it really mean practically for a term to be "placed under MaterialSample"?

Personally I think a larger community discussion needs to happen around basisOfRecord and what its intended to convey. I field a lot of questions in the OBIS and GBIF US communities about this term because it's required and has a controlled vocabulary so data providers and managers have to apply it and it isn't really clear how a downstream user will interpret it.

For the Machine Observations TDWG group, especially for biologging data we are using basisOfRecord to distinguish between observations of an animal where the animal is in hand and having a tag placed on it (HumanObservation) versus the subsequent observations of that animal by a machine (MachineObservation).

albenson-usgs avatar Apr 22 '21 18:04 albenson-usgs

Another issue we have grappled with - https://github.com/ArctosDB/arctos/issues/2075

or not finished grappling with....

Jegelewicz avatar Apr 22 '21 18:04 Jegelewicz

@albenson-usgs If we're strict about the definition of an Occurrence then yes, eDNA is an agglomerative MaterialSample. The event portion of the Occurrences (plural) to which that initially single sample is linked is immediately knowable but the organisms (plural) that were bulk sampled may not be.

As for the practicality of where terms are placed in the DwC classes, it has to do with the operational identifiers we attach to these items and what is their cardinality within our collection management systems. If catalogNumber is a property of an Occurrence then that assumes a 1:1 relationship between it and an occurrenceID - they are operationally the same. However, if several different specimens (or their derivatives) each with a different catalogNumber are derived from a single Occurrence with its single event then we may have a problem because under some conditions, we may need to break the cardinality. In other words, GBIF wants a unique occurrenceID but I've got 10 catalogued items that were derived from a single Occurrence so I cannot make them unique and still adhere to the definition of an Occurrence unless I only publish one of them. If I buck the definition and give all of them then I have to make artificial occurrenceIDs, which may mean loss of functional collaboration across collections or across institutions if there was intent to share & reuse those occurrenceIDs. And, GBIF's value is diminished. As many of you have noticed, GBIF now has a clustering algorithm at play for occurrence records. Is it not the intent here to collapse all those disparate, artificially unique occurrenceIDs into canonical Occurrences? If it isn't, then what's the point? Why force us to make these occurrenceIDs unique? Some of us have already done that clustering!

dshorthouse avatar Apr 22 '21 18:04 dshorthouse

Agree with @dshorthouse. This is highly relevant, as my institution is in the process of setting up an environmental sample/eDNA repository in Arctos, similar to an existing repository at the University of Alaska Museum of the North (https://arctos.database.museum/SpecimenSearch.cfm?guid_prefix=UAM%3AEnv). We are considering including all derived taxonomic IDs and genetic sequences under a single catalog number, as having all been derived from the same occurrence (water sample, soil sample). Alternately, we could catalog each unique taxonomic OTU separately, and link it back to the originally source catalog item via url relationships. The latter is entirely feasible but much more complex, especially if there are hundreds of OTUs that result from a single eDNA sample. What we really need is a way to designate the original source sample, e.g. the water or soil, with a unique source identifier similar to an dwc:organism ID.
Also, our collections have many different examples of catalog items that represent multiple occurrences. These catalog items usually include multiple material samples, e.g. multiple tubes of blood and serum collected from the same animal at different occurrence events . These situations are not hypothetical.

campmlc avatar Apr 22 '21 19:04 campmlc

Hokay... where to begin? (Note to @timrobertson100: Now is the time to go get that cup of tea...)

So, I first climbed into this rabbit hole several years ago, when I started minting materialSampleID identifiers for our specimens. Initially, at least, these had a 1:1 correspondence with occurrenceID values, as presented through DwC. At the time, we had no resources to conduct a major overhaul of our (homegrown) specimen data management systems, but it did trigger a conceptual odyssey that I've been wandering through ever since.

DarwinCore began as a way for the Museum community to share data about preserved specimens (fun fact: the term is credited to Allen Allison, who apparently blurted it out by mistake when he meant to say "Dublin Core" at a ZBIG meeting - or so he tells me). Thus, the original implied basisOfRecord is what we now refer to as PreservedSpecimen. Soon thereafter, it was assumed that the most valuable data extraction from our specimens was in terms of representing points on a map (i.e., distributions of taxa across geography). Non-vouchered observations also represent points on a map, so the implied basisOfRecord was expanded to accommodate what we now refer to as HumanObservation (and in a few cases at the time, what we now refer to as MachineObservation. Accordingly, the core class/term in DwC was changed to Occurrence, as a more general way of representing points on a map.

Somewhere along the way, what we used to think of as "specimens" now became "occurrences", as if they were congruent concepts. But of course, specimens are physical entities with all sorts of properties important to the people who care for them (such as preparations, disposition, etc.), whereas (as @dshorthouse already noted) occurrences are ephemeral things, capturing the abstract idea of an Organism being present in the context of an Event. When MatieralSample was first proposed, it was not (as I recall) an effort to reconcile this logical incongruity. Rather, it was proposed initially to accommodate multi-taxon "gatherings" (e.g., soil, water), which at the time were the basis for the growing notion of eDNA. After some hashing and thrashing on the email discussion forums, the Class was born and now bears the definition "A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.", and the examples are: "A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.". In that context, it's kinda hard not to equate MaterialSample with "Specimen".

(I trust @tucotuco or @stanblum or someone else active in early DwC activities will correct any errors in this historical synopsis...)

I've continued to stare at my ceiling late at night (more often than I should probably admit) pondering the essence and meaning of MaterialSample in the context of other DwC classes, but it's gotten a bit more "real" for me recently. We suddenly have a lot more resources to support the digitization of our collections (and, of particular interest for me, integrate collections data and research data more effectively), and so what had been an entirely intellectual exercise to occupy time late at night, in the shower, stuck in traffic, etc., has now become a very specific practical issue for me. Over the next 4 months, I will be updating the core data model behind our collections data, and one of the specific issues that our CMs need to "fix" is the way we track physical objects in our collections -- i.e., as instances of MaterialSample. Indeed, I recently reached out to @tucotuco

A lot of the discussion above focuses on the boundary between Occurrence and MaterialSample. While I agree that is relevant to the extent that many content providers present "specimen" data as instances of Occurrence, an some have therefore (mistakenly, in my view) equated the two concepts, it's also the easy one to deal with. Instances of MaterialSample very clearly represent physical things preserved in collections, whereas instances of Occurrence represent abstract facts concerning the presence of an instance of Organism at an instance of Event. You don't have to go too deep into the conceptual weeds to grasp the fundamental difference between these two concepts.

Much more challenging (for me, at least), is defining the boundary between MaterialSample and Organism. The way I conceptualize an instance of Organism (which intersect with instances of Event via instances of Occurrence, and with instances of Taxon via Identification), is as a conceptual entity (with physical manifestation) that essentially begins when a sperm meets an egg (or when a single-cell organism divides, or whatever mechanism of reproduction is relevant), passes through all manner of metamorphoses over space and time, and then "ends" at some point. One of the key questions is: what marks the end of the existence of an Organism? The two most obvious candidate answers are: death, and disintegration.

This distinction (death vs. disintegration) comes into play when trying to understand the boundary between an instance of Organism, and an instance of MaterialSample. And this is where my intellectual meanderings keep bumping into a wall. In fact, I recently exchanged a series of emails with both @tucotuco and @baskaufs , primarily to aske the question (among others): Is the TDWG community ready to wrestle with this question? and On what forum should that wrestling take place? Both questions seemed to be answered in the preceding posts on this issue (i.e., "Yes", and "Here").

This post is already too long (even by my standards), so rather than regurgitate all my thinking on this, I'll close by providing a use case, and some follow-up questions.

Use case: A bird is flying across a field, and while traversing a road, gets hit by a car. The driver pulls over, recognizes the bird as something interesting, and contacts the local Museum. The dead bird is then brought to the Museum and given to the VZ CM, who photographs it, assigns a catalog number to it, writes out a label of the pertinent details, and sticks it in a freezer. Some time later, the bird is removed from the freezer, thawed, and prepared for long-term preservation. In keeping with standard protocol, the skin is removed and preserved following one set of protocols, some tissue samples are taken and preserved following another set of protocols, and the remaining tissues are separated from the skeleton and the bones preserved following yet another set of protocols. By traditional practice at the Museum, the same Catalog Number issued to the whole bird is applied to the three separate sets of objects (Skin, Tissue, Skeleton), and one "Specimen" record is created in the database to record all the pertinent information.

I think most would agree that the living bird flying across the field is an instance of an Organism, and that its unpleasant encounter with the car as if flew across the road constitutes an Event, and together this Organism+Event intersection represents an instance of Occurrence. I suspect that most people would also agree that the three preparations derived from that Organism instance represent three instances of MaterialSample.

That's the easy part. But here are the questions to consider:

  1. When did the first MaterialSample instance come into being? The moment the bird encountered the car and it died? The moment the whole bird arrived at the Museum? When it was assigned a catalog number? When it was placed in the freezer (i.e., "preserved")? When the three preparations were created?
  2. Related to this, was the whole bird in the freezer an instance of MaterialSample, serving as a "parent" of the three derived MaterialSample instances (Skin, Tissue, Skeleton)? (perhaps suggesting the need for a new term parentMaterialSampleID?)
  3. Did the instance of Organism cease to exist when the bird's heart stopped beating? Does it continue to exist as a physical entity after the three preparations are created (and after the remaining tissue material disintegrates)? Does it continue to exist as a conceptual entity until all of the physical matter that comprised it fully decomposes?
  4. Related to the above: what is the semantic relationship between an instance of MaterialSample and an instance of Organism? Something like "isDerivedFrom"?

I have my own thoughts on answers to these (and other) questions, but obviously this post is already WAY too long!

Note: several more posts came in as I was writing this, and I continue to agree 100% with the assertions of @dshorthouse.

deepreef avatar Apr 22 '21 19:04 deepreef

@deepreef Regardless of the persistence of the Organism, the "identifier" associated with this organism absolutely has to persist as a linking parent identifier with all subsequent derived parts and preservations, material sample or otherwise, including and especially, parasites and tissues and sequences and media that are deposited other collections and institutions and repositories, to track these back to the source organism and occurrence. This is also true for source/parent material such as soil/water etc for eDNA, which technically is not an "organism" but which also has the same need to track parent/child relationships from a source collection object and occurrence.

campmlc avatar Apr 22 '21 19:04 campmlc

@campmlc :

Regardless of the persistence of the Organism, the "identifier" associated with this organism absolutely has to persist as a linking parent identifier with all subsequent derived parts and preservations, material sample or otherwise, including and especially, parasites and tissues and sequences and media that are deposited other collections and institutions and repositories, to track these back to the source organism and occurrence.

I ABSOLUTELY agree! I was focused more on the conceptual entity of the Organism instance. We need to understand what the "thing" is before we can correctly represent the semantic/cardinality relationships between a digital record (and identifier) representing an Organism, and the other digital identifiers we mint for other classes of "things".

deepreef avatar Apr 22 '21 20:04 deepreef

We are considering including all derived taxonomic IDs and genetic sequences under a single catalog number, as having all been derived from the same occurrence (water sample, soil sample).

I'm not following this. An occurrence is the observation of a taxon at a place and time. What you are talking about here (to me) is an event. The occurrences are the OTUs or taxa that you detected in the event. For me I can't understand how this is one occurrence. This is an event with many occurrences.

What we really need is a way to designate the original source sample, e.g. the water or soil, with a unique source identifier similar to an dwc:organism ID.

I don't understand why you wouldn't use an eventID for this. The event being a sample of water collected at a place and time.

albenson-usgs avatar Apr 22 '21 20:04 albenson-usgs

If we're strict about the definition of an Occurrence then yes, eDNA is an agglomerative MaterialSample

Ok but the associatedSequences is basically the identification of the occurrences. Using @deepreef's logic above it's the intersection of Taxon via Identification that tells you there was an Occurrence which is part of an Event.

albenson-usgs avatar Apr 22 '21 21:04 albenson-usgs

This is an event with many occurrences.

Aha! What I think you mean here is an event with many "things". You can't have an occurrence without an event - they are inextricably linked. It is equally incongruous to imagine an occurrence with many events unless we invoke quantum entanglement. The nut @deepreef is getting us to crack is what are these "things"? We could call them MaterialSamples and there appears to be reason for doing so especially when split & scatter protocols are employed.

dshorthouse avatar Apr 22 '21 21:04 dshorthouse

What I think you mean here is an event with many "things".

But the things are not MaterialSamples because the material sample is also the event (a sample of water, a sample of soil). The many "things" are many different sequences that tell us multiple taxa were present at that place and time (or nearby at least).

albenson-usgs avatar Apr 22 '21 21:04 albenson-usgs

An occurrence is the observation of a taxon at a place and time.

Yep - https://dwc.tdwg.org/terms/#occurrence

This is an event with many occurrences.

I agree - the trouble is, we don't do a good job of this. Events don't have identifiers that are shared by everyone and it is VERY easy to end up with multiple interpretations of a single event.

Jegelewicz avatar Apr 22 '21 21:04 Jegelewicz

@albenson-usgs :

Ok but the associatedSequences is basically the identification of the occurrences. Using @deepreef's logic above it's the intersection of Taxon via Identification that tells you there was an Occurrence which is part of an Event.

Yes -- this is another one of the conundrums. Per DwC definition of associatedSequences:

  • A list (concatenated and separated) of identifiers (publication, global unique identifier, URI) of genetic sequence information associated with the Occurrence.

This is another example of a term currently nested within the Occurrence class, that doesn't belong there. The question is: where does it belong? An argument can be made that the sequences are really associated with the Organism that held the genome from which the sequences were derived. Another argument can be made that the sequences are associated with the MaterialSample, extracted from the Organism, from which the actual sequence was created. But I don't think a case can be made that associatedSequences are properties of an Occurrence. Unlike properties like sex, lifestage and others, which change over the course of the ever-changing essence of an Organism over the course of its existence (and therefore need to be anchored to a moment in time, or captured in the form of an Event), the DNA sequences derived from an Organism are the same across its lifetime.

But that does not address your point, which brings in Taxon and Identification. The latter is the intersection between the former and an instance of Organism. As such, a DNA sequence can serve as "Evdience" (not yet a DwC class -- but perhaps there is a need for it) for a taxonomic Identification, but it is not, strictly speaking, the Taxon itself (nor the Identification itself). Going a step further, I wouldn't call the sequences themselves instances of MaterialSample; I think their more analogous to an Image or other form of multimedia, essentially serving as some sort of "representation" of the Organism, derived directly from a MaterialSample (e.g., tissue sample, or water/soil sample).

But the things are not MaterialSamples because the material sample is also the event (a sample of water, a sample of soil). The many "things" are many different sequences that tell us multiple taxa were present at that place and time (or nearby at least).

I don't view samples of water or soil as "Events", any more than I view specimens as "Events". This diagram of Darwin-SW is very helpful, I think, in showing the semantic relationships among many of the core DwC classes. Unfortunately, it doesn't include a node for MaterialSample (the purpose of my previous epically-long post was an attempt to start figuring out exactly where MaterialSample would fit in this graph -- my current thinking is somehow embedded within "Token", aka "Evidence").

In summary, Events are independent of any Organism (or derivatives of organisms). They are essentially a moment in space-time (intersection of Location with a timestamp, plus some other properties). An Occurrence is the intersection of an Event and an Organism. The intersection of an Organism and a Taxon is an Identification. By my thinking, all the other stuff we traffic in (PreservedSpecimen, FossilSpecimen, LivingSpecimen, HumanObservation, MachineObservation, MaterialCitation, etc.) all represent forms of "Evidence" that support either the truth of an Occurrence instance, or the veracity of an Identification instance; but also are intrinsic things that exist independently of these evidentiary roles.

I would definitely conceptualize PreservedSpecimen and FossilSpecimen as examples of MaterialSample; but it's less clear to me whether LivingSpecimen is best framed as an instance of MaterialSample, or Organsim, or both.

Food for thought: take my use-case of bird, and imagine that prior to flying across the field and being hit by a car, it lived in a Zoo. Was it a MaterialSample when it was in the Zoo, before it escaped, flew across the field, then got hit by the car? If so, was it the same instance of MaterialSample before it ended up in the Museum freezer? And does it matter whether it was conceived and born in the Zoo? What if it was collected in the wild?

What I'm trying to get at is the "essence" of an instance of MaterialSample -- ultimately to define it, but even before that, I'd like to know it when I see it (with apologies to Justice Potter Stewart).

deepreef avatar Apr 22 '21 21:04 deepreef

The many "things" are many different sequences that tell us multiple taxa were present at that place and time (or nearby at least).

This is an interesting one & permit my adventurous thought experiment. What if your eDNA sample came from a river? And, after the data are worked-up, you get "cougar" as a hit among all the other microorganisms. What the heck? Turns out, through radio collar data, you discover that there's another Occurrence record captured that clearly shows a cougar on the river bank some time prior to you scooping your water sample. She cut her gums on a fish she was eating. You could argue that, through calculating the speed of water & rolling back the clock, the Occurrence records represented by your eDNA and that of the radio collar data are precisely the same. They are merely lines of evidence, derived from precisely the same event and precisely the same animal. It's just that the motions of the water added a bit of noise to your appreciation of time. Now what? Surely we need something to differentiate what we have. One record was derived from a radio collar & one was derived from eDNA but, crucially, we do REALLY want to make the joins between these things because there's a story. Is there one Occurrence here or two?

dshorthouse avatar Apr 22 '21 21:04 dshorthouse

In summary, Events are independent of any Organism (or derivatives of organisms). They are essentially a moment in space-time (intersection of Location with a timestamp, plus some other properties). An Occurrence is the intersection of an Event and an Organism. The intersection of an Organism and a Taxon is an Identification. By my thinking, all the other stuff we traffic in (PreservedSpecimen, FossilSpecimen, LivingSpecimen, HumanObservation, MachineObservation, MaterialCitation, etc.) all represent forms of "Evidence" that support either the truth of an Occurrence instance, or the veracity of an Identification instance; but also are intrinsic things that exist independently of these evidentiary roles.

This makes all kinds of sense to me.

Jegelewicz avatar Apr 22 '21 21:04 Jegelewicz

Is there one Occurrence here or two?

if

An Occurrence is the intersection of an Event and an Organism.

It seems like there is ONE. The machine observation and material sample are evidence for it.

Although, technically, unless the bleeding coincides EXACTLY with the radio collar ping, maybe not?

Jegelewicz avatar Apr 22 '21 21:04 Jegelewicz

Although, technically, unless the bleeding coincides EXACTLY with the radio collar ping, maybe not?

Way to rain on my parade, @Jegelewicz.

dshorthouse avatar Apr 22 '21 22:04 dshorthouse

Is there one Occurrence here or two?

Ok yes I see this as one occurrence. So you're saying that the associatedSequences for this Organism being in the MaterialSample Class will help us make the link between these two occurrences which are really one made by two different sampling methods?

For reference, in the OBIS world there is a real world problem like this coming from EurOBIS where you have an ARMS sampling as well as eDNA sampling happening at the same location and time and therefore you may have evidence of the same occurrence coming from different sampling methods.

albenson-usgs avatar Apr 22 '21 22:04 albenson-usgs

On 2021-04-22 4:54 PM, David Shorthouse wrote:

One record was derived from a radio collar & one was derived from eDNA but, crucially, we do REALLY want to make the joins between these things because there's a story. Is there one |Occurrence| here or two?

At first, I would think ONE, but aren't you giving us a story that may have a different explanation? What if there's another uncollared cougar? How can we be certain (unless the collared cougar has a known DNA sample to match to the eDNA) that it was the same cougar?

You could say the evidence suggests they are one and the same cougar, given two pieces of information. But you could also wonder if there's another cougar?

debpaul avatar Apr 22 '21 22:04 debpaul