mixs icon indicating copy to clipboard operation
mixs copied to clipboard

provide per-package slims of ENVO for env triads

Open cmungall opened this issue 4 years ago • 5 comments

This is something we are doing in NMDC that we want to push up to the standards level

Submitters often find it difficult to select the correct ENVO terms. This is compounded by the lack of suitable ontology browsing tools and the prevalence of spreadsheet-based data submission vs dedicated tools with intelligent context-aware support for term selection that we see in other areas of biocuration. This is also made difficult by ENVO's move away from a system whereby each term came from one of three hierarchies. Things are more open-ended now, which leads to more submitter/annotator confusion. This is in evidence from the extremely poor quality of ENVO annotations in INSDC.

As a partial solution we should have recommended slims for each package/field combination. Submitters/annotators can still select terms outside these fields but these would serve as the starting point. Even if submitters restrict themselves to the selected fields then I hypothesize the gain in accuracy would vastly overcome loss in precision.

I suggest a 3 column format

  • package
  • field (env_X)
  • valid ENVO term

An entry in this table means that the ENVO term is valid for the package/field combination

We could also have:

  • package
  • field (env_X)
  • valid ENVO term
  • ENVO local name

If we want to rename some of the more abstract ENVO labels in a local context

(this format also cleanly maps to the LinkML YAML format, which is how I envision us maintaining this moving forward)

This can also be easily implemented via dropdowns in spreadsheets

We in NMDC can get us started with a selection for soil

Note that as tooling becomes more sophisticated we can have less primitive ways of guiding users to the right terms but we have to start with something that works within the current tooling ecosystem

cmungall avatar Mar 10 '21 19:03 cmungall

Can this be ready to implement in MIxS 6 by May ?

Sent from my iPhone

On Mar 10, 2021, at 2:17 PM, Chris Mungall [email protected] wrote:

 This is something we are doing in NMDC that we want to push up to the standards level

Submitters often find it difficult to select the correct ENVO terms. This is compounded by the lack of suitable ontology browsing tools and the prevalence of spreadsheet-based data submission vs dedicated tools with intelligent context-aware support for term selection that we see in other areas of biocuration. This is also made difficult by ENVO's move away from a system whereby each term came from one of three hierarchies. Things are more open-ended now, which leads to more submitter/annotator confusion. This is in evidence from the extremely poor quality of ENVO annotations in INSDC.

As a partial solution we should have recommended slims for each package/field combination. Submitters/annotators can still select terms outside these fields but these would serve as the starting point. Even if submitters restrict themselves to the selected fields then I hypothesize the gain in accuracy would vastly overcome loss in precision.

I suggest a 3 column format

package field (env_X) valid ENVO term An entry in this table means that the ENVO term is valid for the package/field combination

We could also have:

package field (env_X) valid ENVO term ENVO local name If we want to rename some of the more abstract ENVO labels in a local context

(this format also cleanly maps to the LinkML YAML format, which is how I envision us maintaining this moving forward)

This can also be easily implemented via dropdowns in spreadsheets

We in NMDC can get us started with a selection for soil

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

lschriml avatar Mar 10 '21 19:03 lschriml

I love the idea and agree it would probably make a huge difference to the ease of use, however, I think it's a massive undertaking to generate all the required slims and have them vetted by relevant user-groups for every environmental package before May. We can make a start as soon as anyone has the bandwidth to do so, but I think it's unrealistic to have it ready for public consumption by May. We should schedule its release for the MIxS v7 instead. Can the suggested slims be treated in the same way as our controlled vocabulary fields, or to put it another way, can our other controlled vocabulary fields use the same technology as these slims? After all, a CV is just a slim of the English language!

only1chunts avatar Mar 11 '21 10:03 only1chunts

This makes sense, connects to some of our subsets in ENVO.

I agree with @only1chunts that this is more likely to be a MIxS 7 target. However, I think we should release a general suggestion for further revision, rather than wait for full consensus.

pbuttigieg avatar Mar 16 '21 16:03 pbuttigieg

Sounds good.

Sent from my iPhone

On Mar 16, 2021, at 12:09 PM, Pier Luigi Buttigieg @.***> wrote:

 This makes sense, connects to some of our subsets in ENVO.

I agree with @only1chunts that this is more likely to be a MIxS 7 target.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

lschriml avatar Mar 16 '21 16:03 lschriml

NMDC did this for soil, sediment, water, and plant extensions. I think it's safe to say we'd love to have this for other extensions, but it's not a small lift!

mslarae13 avatar Jun 09 '25 20:06 mslarae13