envo icon indicating copy to clipboard operation
envo copied to clipboard

Proposed ENVO simple layer

Open cmungall opened this issue 2 years ago • 5 comments

As part of the NEON/NMDC collaboration we have been mapping NEON terms to ENVO. The team needed a concept "deciduous forest", and the closest they could find was "area of deciduous forest", so this was used.

However, this leaves us in an odd position. If we look at the placement of "area of deciduous forest" in comparison to other more specific "deciduous forest" terms:

image

We see that they are in completely separate branches. We would like all terms we use to be in a single is-a/part-of hierarchy, such that we can do standard ontology operations such as provide faceted browsing, roll up terms for analyses, etc.

The two branches here also surfaces a number of other unusual decisions.

  • "tropical broadleaf forest biome" is not classified under biome, despite the name (UPDATE fixed via logical defs)
  • "biome" is an ecosystem, yet "terrestrial biome" is not a "terrestrial ecosystem" (UPDATE fixed via logical defs)
  • none of the terms under "forest ecosystem" are a part of the biosphere, which seems like an accidental omission (UPDATE not yet fixed)
  • none of the terms under "forest ecosystem" have is-a or part-of relationships to planet or astronomical body part (UPDATE partially fixed)
  • we have duplicative concepts like "tropical broadleaf forest" and "tropical broadleaf forest biome" with little guidance on when to choose one over the other for annotation
  • there is at least one case of a deciduous forest classified under coniferous forest
  • the general design patterns and classification rules are not clear. Observe how the the tropical branch differs from the the temperate one.

In tropical, we have "tropical {...} broadleaf forest IS_A tropical broadleaf forest":

image

But this is inconsistent with "temperate {...} broadleaf forest PART_OF temperate broadleaf forest":

image

It would be useful if we could somehow tease apart the core concepts in ENVO independent of whether something is considered an area, an ecosystem, a biome, or an astronomical body part. For annotating sample sources and many other applications of users would appreciate something that looks more like anatomy ontologies, GO, etc, with less duplication of concepts, where the core concepts are in one hierarchy and follow standard classification patterns.

In this classification we would have a general concept like "forest", and then we would have a fairly consistent lattice of sub-types (no part-ofs) for the various ways a forest can be classified: tropical vs temperate, needle vs broad leaf, deciduous vs non-deciduous...

We would reserve parthood for things that are clearly composition - a forest is made of trees, a canopy is part of a forest, the forest is part of the terrestrial composition of the earth, etc

Then for groups that need to distinguish area vs ecosystem vs biome vs feature, these could be added as separate branches; however, the simple core could still be extracted and it would form a coherent consistent core.

cmungall avatar Jul 11 '23 00:07 cmungall

I notice that ENVO has a forest ecosystem term. Should deciduous forest be a subclass of that?

On a more general note, I agree with you that we need design patterns that avoid defining things as immaterial entity.

wdduncan avatar Jul 11 '23 14:07 wdduncan

@wdduncan thanks for your comments!

notice that ENVO has a forest ecosystem term. Should deciduous forest be a subclass of that?

The essence of my proposal is that we would like to see a class hierarchy that organized something like this

  • forest
    • deciduous forest
      • temperate deciduous needleaf forest
      • ...

We are neutral as to whether these are conceived of as ecosystems, biomes, whatever. This would also work for us

  • forest ecosystem
    • deciduous forest ecosystem
      • temperate deciduous needleaf forest ecosystem
      • ...

Note that for brevity I am only showing one path here but this could potentially be a lattice.

The main point of my proposal is that the concepts annotators need should be in one hierarchy, rather than the concept of "decidous forest" being in one hierarchy, "temperate deciduous needleaf forest ecosystem" being in another, etc.

On a more general note, I agree with you that we need design patterns that avoid defining things as immaterial entity.

To be clear, I am not proposing that we avoid adding immaterial entities. I would prefer to see the use cases for these clearly articulated, but I trust these use cases exist.

We just need to capture the base concepts in a single hierarchy.

Right now, an annotator who is looking to annotate a "deciduous forest" concept is likely to pick "area of deciduous forest" (this has indeed happened for NEON annotation). This means that our annotations are a pick and mix of different branches, and roll-up queries don't work as expected - e.g. annotations to "deciduous forest" do not roll up to "forest ecosystem", and annotations to "temperate deciduous needleaf forest ecosystem" do not roll up to "deciduous forest".

cmungall avatar Jul 12 '23 02:07 cmungall

@cmungall I agree with your reasoning about the advantages of a single hierarchy.

To be clear, I am not proposing that we avoid adding immaterial entities. I would prefer to see the use cases for these clearly articulated, but I trust these use cases exist.

I didn't think you were. This is my opining ;)

Perhaps my use of 'avoid' sounded to strong. Here is all I meant.

It seems to me that immaterial entities (and ICEs) generate a number of these shadow issues. Consider (for example) how we would represent the northern hemisphere or Eastern Europe. Such entities invite the use immaterial entity (especially amongst the more philosophically/BFO minded developers .... I think I may still be included in this group). This, in turn, increases the chance of creating shadow hierarchies.

It would be better (again my opinion) if the default stance was to classify things as material entities, and use immaterial entity only when a clear and motivating case can be made.

wdduncan avatar Jul 12 '23 14:07 wdduncan

After a long querying and manual filtering exercise of terms from the environmental system, environmental zone, layer, and astronomical body part hierarchies, filterning it into a around 1164 terms in ~100 categories. I found the following. Shown below are the 28 most problematic concepts which are only or mostly found in some combination of biome, layer, environmental zone, ecosystem, (or others), but not in the astronomical body part hierarchy, which is the closest current hierarchy to the single branch with the bulk of terms. I think cleaning up all terms associated with these concepts would help with the vision of a single hierarchy for curators.

ENVO Parent Category
ecosystem bog
layer canopy
geographic feature and ABP cut
ecosystem ecotone
Various estuarine
ecosystem farm
layer floor
Various forest
layer front
Various grassland
Various marine
ecosystem marsh
ecosystem meadow
ecosystem mire
various geographic feature, biome and layer neritic
environmental zone oasis
various ecosystem and ABP palsa
Various: biome and layer pelagic
ecosystem plantation
environmental zone plate
environmental zone scrubland
biome shrubland
Various swamp
Various tidal/tide
Various tundra
Various: layer, environmental zone vegetation
Various wetland
Various woodland

kaiiam avatar Jul 12 '23 16:07 kaiiam

OAK report here:

envo-ldef-rpt.tsv.txt

I have exerted a subset of it here:

  • this is based on simple lexical patterns (the axiomatization is too inconsistent to drive this)
  • id and label show the "atomic concept" (if id is null the atomic concept cannot be found)
  • further columns represent "derived concepts"
id label environment ecosystem biome area num_concepts
ENVO:01001357 desert ENVO:01001780 *ENVO:01000179/GEN +ENVO:00000097/DF 4
None freshwater ENVO:01000306 ENVO:01001789 ENVO:00000873 3
None aquatic ENVO:01000317 ENVO:01001787 ENVO:00002030 3
ENVO:01000206 temperate ENVO:01001705 ENVO:01001831 3
None marine ENVO:01000320 ENVO:01001788 ENVO:00000447 3
ENVO:01000205 subtropical ENVO:01001702 ENVO:01001832 3
ENVO:01000204 tropical ENVO:01001701 ENVO:01001830 3
ENVO:01000238 polar ENVO:01001703 ENVO:01000339 3
ENVO:01000251 subpolar ENVO:01001704 ENVO:01001834 3
None alpine tundra ENVO:01001371 ENVO:01001505 ENVO:03400001 3
None grassland ENVO:01001206 ENVO:01000177 ENVO:00000106 3
None cropland ENVO:01001244 ENVO:01000245 ENVO:01000892 3
None woodland ENVO:01001245 ENVO:01000175 ENVO:00000109 3
None tundra ENVO:01001370 ENVO:01000180 ENVO:00000112 3
ENVO:01000431 mixed forest *ENVO:01000198/GEN ENVO:01000855 3
ENVO:00002010 saline water ENVO:01000307 2
ENVO:00002149 sea water ENVO:01000321 2
ENVO:00002019 brackish water ENVO:01000322 2
ENVO:00005791 sterile water ENVO:01001042 2
ENVO:00002012 hypersaline water ENVO:01001043 2
ENVO:00001998 soil ENVO:01001044 2
ENVO:00002007 sediment ENVO:01001048 2
ENVO:00010505 aerosol ENVO:01001052 2
UBERON:0015474 axilla skin ENVO:08000001 2
PATO:0001429 acidic +ENVO:01000315/DF 2
PATO:0001430 alkaline ENVO:01000316 2
UBERON:0002416 integumental system ENVO:2100004 2
UBERON:0000022 feather ENVO:2100006 2
ENVO:00005801 rhizosphere ENVO:01000999 2
UBERON:0001555 digestive tract ENVO:01001033 2
UBERON:0001474 bone element ENVO:01001306 2
UBERON:0001062 anatomical entity ENVO:2100000 2
UBERON:0000160 intestine ENVO:2100002 2
ENVO:01000112 polymetallic nodule ENVO:01001629 2
None swamp ENVO:00000233 ENVO:01001208 2
None wetland ENVO:01001209 ENVO:00000043 2
None forest ENVO:01001243 ENVO:01000174 2
None polar tundra ENVO:01001625 ENVO:03400002 2
None terrestrial ENVO:01001790 ENVO:00000446 2
ENVO:01000143 marine reef ENVO:01000029 2
ENVO:01000122 marine hydrothermal vent +ENVO:01000030/DF 2
ENVO:00000015 ocean ENVO:01000048 2
ENVO:01000150 marine subtidal rocky reef +ENVO:01000050/DF 2
ENVO:01000161 marine sponge reef +ENVO:01000123/DF 2
ENVO:01001802 subtropical moist broadleaf forest *ENVO:01000226/GEN 2
ENVO:00000021 freshwater lake ENVO:01000252 2
ENVO:01000297 freshwater river ENVO:01000253 2
None gramanoid or herbaceous vegetation ENVO:01000888 1
None lichen-dominated vegetation ENVO:01000889 1
None moss-dominated vegetation ENVO:01000890 1
None pastureland or hayfields ENVO:01000891 1
None woody wetland ENVO:01000893 1
None emergent herbaceous wetland ENVO:01000894 1

cmungall avatar Aug 30 '23 21:08 cmungall