dwc icon indicating copy to clipboard operation
dwc copied to clipboard

New term - environmentalMaterial

Open tucotuco opened this issue 10 years ago • 13 comments

  • Submitter: John Wieczorek on behalf of the May 2013 GBIF hackathon-workshop on Darwin Core and sample data
  • Justification (why is this term necessary?): see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424
  • Proponents (at least two independent parties who need this term): see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424.

Proposed attributes of the new term:

  • Term name (in lowerCamelCase): environmentalMaterial
  • Class (e.g. Location, Taxon): Event
  • Definition of the term: The medium or part of the medium of an environmental system.
  • Usage comments (recommendations regarding content, etc.): Recommended best practice is to use a controlled vocabulary such as the set of subclasses of the environmental material class (http://purl.obolibrary.org/obo/ENVO_00010483) of the Environment Ontology (ENVO). Values are to represent media as being composed primarily of the named entity, rather than restricted entirely to that entity. For example, "envo:liquid water" is to be understood as "environmental material composed primarily of some chebi:water" in liquid form.
  • Examples: envo:soil, envo:sediment, envo:saline water
  • Refines (identifier of the broader term this term refines, if applicable):
  • Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable):
  • ABCD 2.06 (XPATH of the equivalent term in ABCD, if applicable): not in ABCD

Original first comment:

Was https://code.google.com/p/darwincore/issues/detail?id=191

Reported by gtuco.btuco, Sep 25, 2013

==New Term Recommendation==

Submitter: John Wieczorek on behalf of the May 2013 GBIF hackathon-workshop on Darwin Core and sample data

Justification: see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424

Term Name: environmental material Identifier: http://purl.obolibrary.org/obo/ENVO_00010483 Namespace: http://purl.obolibrary.org/obo/ Label: Environmental Material Definition: Material in or on which organisms may live. Comment: Examples: "scum", "http://purl.obolibrary.org/obo/ENVO_00003930". For discussion see https://code.google.com/p/darwincore/wiki/Event (there will be no further documentation here until the term is ratified) Type of Term: http://www.w3.org/2000/01/rdf-schema#Class Refines: Status: proposed Date Issued: 2013-09-25 Date Modified: 2013-09-25 Has Domain: Has Range: Refines: Version: http://purl.obolibrary.org/obo/ENVO_00010483 Replaces: IsReplaceBy: Class: http://rs.tdwg.org/dwc/terms/Event ABCD 2.0.6: not in ABCD (someone please confirm or deny this)

Sep 26, 2013 #1 gtuco.btuco Based on initial discussions on tdwg-content, modified the proposal to make a new DwC property term that recommends the ENVO class as the range, as follows:

Term Name: environmentalMaterial Identifier: http://rs.tdwg.org/dwc/terms/environmentalMaterial Namespace: http://rs.tdwg.org/dwc/terms/ Label: Environmental Material Definition: Material in or on which organisms may live. Recommended best practice is to use a controlled vocabulary such as defined by the environmental feature class of the Environment Ontology (ENVO). Comment: Examples: "scum", "http://purl.obolibrary.org/obo/ENVO_00003930". For discussion see https://code.google.com/p/darwincore/wiki/Event (there will be no further documentation here until the term is ratified) Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property Refines: Status: proposed Date Issued: 2013-09-26 Date Modified: 2013-09-26 Has Domain: Has Range: Refines: Version: environmentalMaterial-2013-09-26 Replaces: IsReplaceBy: Class: http://rs.tdwg.org/dwc/terms/Event ABCD 2.0.6: not in ABCD (someone please confirm or deny this)

tucotuco avatar Nov 13 '14 16:11 tucotuco

See also Issue #37, Issue #38, and Issue #39.

tucotuco avatar Mar 04 '15 22:03 tucotuco

Opened public discussion on tdwg-content (http://lists.tdwg.org/pipermail/tdwg-content/2015-March/003507.html).

tucotuco avatar Mar 26 '15 20:03 tucotuco

This proposal has already passed through public review in 2015 without objections, however it is not clear that demand has been demonstrated.

tucotuco avatar Sep 09 '20 17:09 tucotuco

There was a public review of this and related proposals in 2015 in which there were observations that the proposal as presented does not make sense. The ENVO classes can not be Darwin Core properties. Instead, new properties would have to be minted for Darwin Core with the recommendation to have the range of values come from ENVO classes. In any case, There is no evidence in the discussion history for demand for these terms. If anyone wants to move this proposal forward, please provide a new term definition addressing the property/class issue and provide evidence of sufficient demand.

tucotuco avatar Sep 09 '20 17:09 tucotuco

I was in error to note that there was a need for a demonstration of demand. The proposal was a direct result of an international workshop. Also, the revised term proposal has already been proposed. With an updated comment showing just the proposal.

tucotuco avatar Sep 09 '20 19:09 tucotuco

The definitive term change proposal under consideration is at the beginning of the first comment in this issue.

Updated term change request:

  • Submitter: John Wieczorek on behalf of the May 2013 GBIF hackathon-workshop on Darwin Core and sample data
  • Justification (why is this term necessary?): see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424
  • Proponents (at least two independent parties who need this term): see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424.

Proposed attributes of the new term:

  • Term name (in lowerCamelCase): environmentalMaterial
  • Class (e.g. Location, Taxon): Event
  • Definition of the term: The medium or part of the medium of an environmental system.
  • Usage comments (recommendations regarding content, etc.): Values are to represent media as being composed primarily of the named entity, rather than restricted to that entity. For example, "ENVO:water" is to be understood as "environmental material composed primarily of some CHEBI:water". Recommended best practice is to use a controlled vocabulary such as the set of subclasses of the environmental material class (http://purl.obolibrary.org/obo/ENVO_00010483) of the Environment Ontology (ENVO).
  • Examples: envo:soil, envo:sediment, envo:saline water
  • Refines (identifier of the broader term this term refines, if applicable):
  • Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable):
  • ABCD 2.06 (XPATH of the equivalent term in ABCD, if applicable): not in ABCD

tucotuco avatar Sep 09 '20 19:09 tucotuco

@pbuttigieg Would you be willing to pre-assess this proposal, as it has been a long time in the making. Does it still make sense as proposed?

tucotuco avatar Apr 19 '21 02:04 tucotuco

This term should have a dwciri: analog. Here is what I believe would be appropriate metadata for dwciri:environmentalMaterial:

  • Definition of the term: The medium or part of the medium of an environmental system.
  • Usage comments (recommendations regarding content, etc.): Values are to represent media as being composed primarily of the named entity, rather than restricted to that entity. For example, "ENVO:water" is to be understood as "environmental material composed primarily of some CHEBI:water". Recommended best practice is to use an IRI from a controlled vocabulary such as the set of subclasses of the environmental material class (http://purl.obolibrary.org/obo/ENVO_00010483) of the Environment Ontology (ENVO).
  • Examples: http://purl.obolibrary.org/obo/ENVO_00001998, http://purl.obolibrary.org/obo/ENVO_00002007, http://purl.obolibrary.org/obo/ENVO_00002010

I actually have some questions about the usage comments here and in the proposal text, but I will put that in a different comment box.

baskaufs avatar Apr 22 '21 01:04 baskaufs

I have some real questions about the usage comments and examples.

  1. In the usage comments, you use ENVO:water with "ENVO" capitalized, while in the examples you use envo:soil with "envo" in lower case. The use of case should be consistent.

  2. The "controlled vocabulary" examples are fraught with problems. Based on the form of the examples, here is what I think you are actually saying: "take the string envo: (which looks like a namespace abbreviation, but isn't actually defined anywhere) and concatenate it to the English label that is currently being used for the class term in the ENVO ontology." In the TDWG universe, we routinely use compact URIs, or "CURIEs" to abbreviate a term IRI and as a shorthand. For example we use dwc:country as an abbreviation for http://rs.tdwg.org/dwc/terms/country. That works because dwc: is a well-known namespace abbreviation for http://rs.tdwg.org/dwc/terms/ and since TDWG uses non-opaque local identifiers like country in its IRIs, people can pretty much "read" the CURIE and know what it means.

But for better or worse, OBO ontologies use opaque local identifiers. I'm not sure what the consensus namespace abbreviations is for ENVO. I suppose it might be envo: = http://purl.obolibrary.org/obo/ENVO_ or maybe ENVO: = http://purl.obolibrary.org/obo/ENVO_. But if that is the case, then the CURIE for soil would be envo:00001998, not envo:soil. envo:soil isn't anything real, as far as I know other than a guess at a namespace abbreviation appended to a label.

This problem is illustrated with the ENVO:water example. As far as I can tell, "water" in ENVO is http://purl.obolibrary.org/obo/ENVO_00002006. But if you actually go to the page for the term: http://purl.obolibrary.org/obo/ENVO_00002006, you see that the label used there is actually "liquid water". So following the pattern in the examples, the "controlled value" should be envo:liquid water or maybe ENVO:liquid water, but not ENVO:water. If you put "water" into the search box, you get this as the second result:

http://purl.obolibrary.org/obo/ENVO_00002006 (ENVO):
*  water in Ontobee: ENVO
*  liquid water in Ontobee: ENVO

That implies that ENVO accepts "water" and "liquid water" as alternate labels. Can people just pick which one they like better to use as the "controlled" value?

The problem is that ENVO is an ontology and not actually a controlled vocabulary and to try to use it for that, we are conflating IRIs (in the form of CURIEs), labels, and controlled value strings, which are all actually different from each other.

It seems to me that it would make more sense if we want people to use ENVO terms as controlled values to just have them use the English label as shown on the term page. That would make the values in the example be: liquid water, soil, sediment, and saline water without any pseudo-namespaces. Of course that is a problem if people mix in other ontologies besides ENVO and use non-unique labels. A non-ambiguous solution would be to use dwciri:environmentalMaterial with a full IRI value from ENVO, but that would be opaque and I suppose people would not like that.

Another alternative, which in my opinion would probably be the best, would be to just go ahead and make a real controlled vocabulary that specifies the required controlled value strings. The definitions could still be linked to ENVO. For an example, see the draft controlled vocabulary for subjectPart that we are completing in Audubon Core. In that controlled vocabulary, we link each controlled vocabulary term to an ontology definition from OBO, but explicitly specify the controlled value string to be used, following the convention of camelCase with no spaces. This would not be that hard to implement if you really want people to use those ENVO subclasses -- it would just be a matter of setting up a table similar to the example I provided.

But the currently proposed design pattern is just asking for people to effectively be guessing or making up their own "controlled" values.

baskaufs avatar Apr 22 '21 02:04 baskaufs

I have just spent some additional time investigating the possibilities of auto-generating controlled value strings from labels using data acquired straight from Ontobee using a SPARQL query. You can run my test at the Ontobee endpoint.

prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT distinct ?subclass ?label
WHERE { 
bind(<http://purl.obolibrary.org/obo/ENVO_00010483> as ?rootIri)
?subclass rdfs:subClassOf ?rootIri.
?subclass rdfs:label ?label.
filter(contains(str(?subclass), "ENVO"))
minus {?subclass owl:deprecated "true"^^xsd:boolean.}
}

This query gets the IRIs and labels for direct subclasses of the environmental material class. One should be able to extend this to all child subclasses using the property path operator * like this:

?subclass rdfs:subClassOf* ?rootIri.

but doing so results in this error: Exceeded 1000000000 bytes in transitive temp memory., which seems pretty weird to me since there is a finite number of subclasses and it shouldn't be that hard to just get them all with their labels. Perhaps someone better at SPARQL can figure out what the problem is. If I have time, I may just try to download the whole ontology and load it into a local SPARQL endpoint to see if it works better.

I did learn a few useful things from this exercise. One is that there is some inconsistency in how the labels are expressed. Some are plain literals, some are language-tagged (@en) literals, and some are literals datatyped as xsd:string with duplication across these three categories. So some de-duplicating would be required after getting all of the labels.

The other thing is that there are just many, many values here, including many obscure things like "bacon curing brine", "flue gas desulfurization material", and "congelation ice in a fresh water body". That means that the problem of proliferation of label variants will be particularly acute in this case if we depend on people constructing their own controlled values from label strings.

baskaufs avatar Apr 22 '21 13:04 baskaufs

@baskaufs Thanks for the thorough investigation of the proposal. It is interesting to see the issues that arise from what seems like a natural extension of capabilities by invoking an ontology as a source for a controlled vocabulary. It sounds like the route of defining a controlled vocabulary coupled to ontology definitions is the sensible way to go, but I wouldn't suggest going that far in this proposal. No one has requested it and the work would be tremendous. We have a couple of alternatives. One is to abandon the proposal, especially since there hasn't been any expressed interest since the 2013 meeting that generated it. Another alternative is to modify the proposal to be less proscriptive about the vocabulary to use, specifically, "Recommended best practice is to use a controlled vocabulary. Values are to represent the environmental material as being composed primarily of the named entity, rather than restricted entirely to that entity. For example, 'liquid water' is to be understood as 'environmental material composed primarily of water in liquid form'."

tucotuco avatar Apr 29 '21 02:04 tucotuco

@tucotuco I think that the mechanism I suggested for creating controlled values is viable -- see the suggestion I made for values for the dwc:biome proposal. However, in this case, it seems to me that the real issue is that there are just so many subclasses of the environmental material class that it is not reasonable to suggest that they could be used to create a manageable controlled vocabulary. I would suggest shelving this proposal until its proponents suggest a viable mechanism for managing a controlled vocabulary for the property. If nobody can successfully do that, I would say this proposal should be considered unimplementable.

baskaufs avatar Apr 29 '21 12:04 baskaufs

The Darwin Core Maintenance Group feels that this proposal has not reached a sufficient state of maturity and recommends that a Task Group be formed to develop solutions to the issues raised.

tucotuco avatar Apr 30 '21 17:04 tucotuco