mixs icon indicating copy to clipboard operation
mixs copied to clipboard

Updates to terms 16S recovered [MIXS:0000065] and 16S recovery software [MIXS:0000066]

Open only1chunts opened this issue 1 year ago • 15 comments

Should expand these terms to enable reporting of the recovery of other taxonomic marker genes, not just 16S?

Current term details of 16S recovered [MIXS:0000065]

name: x16s_recover
description: Can a 16S gene be recovered from the submitted SAG or MAG?
title: 16S recovered
examples:
- value: 'yes'
in_subset:
- sequencing
from_schema: https://w3id.org/mixs
keywords:
- recover
slot_uri: MIXS:0000065
alias: x16s_recover
domain_of:
- Mimag
- Misag
range: boolean

Suggested update(s)

The parts that could be updates are listed below, but further input from experts in the area is required to determine the most suitable ways to make the changes.

name: what should the new short name be? tax_mark_gene_recov description: Can a specific taxonomic marker gene be recovered from the submitted SAG or MAG? title: what should the new full name be? taxonomic marker gene recovery

Additional context

These terms are bacterial specific. They are utilised in the MISAG and MIMAG packages, previously @rdfinn has suggested those packages need to be more generic to enable inclusion of non-bacterial MAG/SAGs (see ticket #602 ) . Can we therefore only specify when a 16S gene is recovered from an assembled genome? I think we should expand the term to enable reporting of recovery of other taxonomic marker genes.

Question: If this term is used should it be a requirement for the term "target gene" [MIXS:0000044] to also be included? i.e. together this boolean term yes/no and the gene name from the target gene field give sufficient information.

Question: If this term is used should it be a requirement for the term MIXS:0000066 to be used as well? (i.e. the software term below)

Current term details of 16S recovery software [MIXS:0000066]

name: x16s_recover_software
description: Tools used for 16S rRNA gene extraction
title: 16S recovery software
examples:
- value: rambl;v2;default parameters
in_subset:
- sequencing
from_schema: https://w3id.org/mixs
keywords:
- recover
- software
slot_uri: MIXS:0000066
alias: x16s_recover_software
domain_of:
- Mimag
- Misag
range: string
pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+);([^\s-]{1,2}|[^\s-]+.+[^\s-]+);([^\s-]{1,2}|[^\s-]+.+[^\s-]+)$
structured_pattern:
  syntax: ^{software};{version};{parameters}$
  interpolated: true
  partial_match: true

Suggested update(s)

The parts that could be updates are listed below, but further input from experts in the area is required to determine the most suitable ways to make the changes.

name: what should the new short name be? description: Tools used for taxonomic marker gene extraction/recovery. title: what should the new full name be?

only1chunts avatar Jun 11 '24 16:06 only1chunts

proposal from CIG call 25 Jun: Rename 16S recover [MIXS:0000065] -> target gene recovered (& change the short name too.) Rename 16S recovery software [MIXS:0000066] -> target gene recovery software (& change the short name too.)

only1chunts avatar Jun 25 '24 14:06 only1chunts

  • I checked with the MGnify group and they are very pleased that these changes are progressing. (I just saw Rob's email address above as a requestor!)
  • Making it more generic makes sense to me too.

Woolly-at-EBI avatar Jul 01 '24 17:07 Woolly-at-EBI

Is there a general policy on changing the meaning of a term?

Making it broader is safe as metadata annotated using the old term will still be valid after the change. However, making the meaning of terms broader can lose information. For example, knowing what marker gene was involved.

In OBO changing the meaning of a term necessitates obsoleting the ID and making a new one, and having a broader-than between new and old. This may be too impractical for MIxS however. But it would be good to have general guidelines.

cmungall avatar Jul 01 '24 21:07 cmungall

I like the idea of making the term more generic/applicable to more marker genes. However previously it was clear what was targeted, how can this be implemented now do we support some kind of controlled vocabulary or is that making it limited again?

jjkoehorst avatar Jul 08 '24 06:07 jjkoehorst

All of this would be so much easier if we didn't expect the values of all MIxS terms to be strings or lists of strings

turbomam avatar Jul 08 '24 18:07 turbomam

obselete the old terms with a property to attribute the new replacment term so as to maintain the link between the terms

only1chunts avatar Jul 23 '24 14:07 only1chunts

Discussed at CIG 2024-07-23

  • Deprecate these terms
  • Add new terms and provide "replaced by" or "consider" to map the new terms to the old terms.

Next steps:

  • Finalize the attribute that we will use for matching the old terms to the replaced terms
  • Document a "how to replace a term" protocol for GSC
    • Make sure term changes are backwards compatible
  • Create draft PR with the changes proposed & updates to review and approve with CIG

(I started this comment and got distracted. This is all I can recall at the moment and will add more later if I think of what I missed)

mslarae13 avatar Jul 23 '24 15:07 mslarae13

Additional note, keep to <20 characters for slots

From above

CIG call 25 Jun: Rename 16S recover [MIXS:0000065] -> target gene recovered (& change the short name too.) Rename 16S recovery software [MIXS:0000066] -> target gene recovery software (& change the short name too.)

mslarae13 avatar Nov 12 '24 16:11 mslarae13

see also https://github.com/GenomicsStandardsConsortium/mixs/pull/671/files for some precedent

turbomam avatar May 28 '25 13:05 turbomam

I will link this issue to a branch for creating new terms with see_also relationships to the old terms, which should be deprecated in another issue/branch/PR

turbomam avatar May 28 '25 13:05 turbomam

It doesn't look like the 16S terms have ever been used in INSDC Biosamples, either with or without the x prefix!

db.biosamples_attributes.countDocuments({
  $or: [
    { "attribute_name": { $in: ["16s_recover", "x16s_recover", "16s_recover_software", "x16s_recover_software"] } },
    { "harmonized_name": { $in: ["16s_recover", "x16s_recover", "16s_recover_software", "x16s_recover_software"] } }
  ]
})

turbomam avatar May 28 '25 13:05 turbomam

Summary of Instructions for Transforming 16S-Centric Terms into Marker-Gene Terms

Key Transformations:

  1. Broaden scope: Change from 16S-specific to generic "taxonomic marker gene" terms
  2. New URIs: Use sequential next available URIs (MIXS:0001337, MIXS:0001338) instead of reusing existing ones
  3. New names:
    • x16s_recovermarker_gene_recov
    • x16s_recover_softwaremarker_gene_recov_sw
  4. Expand examples: Add broader taxonomic marker gene tool examples (barrnap, metaxa2) to complement the original rambl example (Claude's addition, not requested)

LinkML Structure Changes:

  1. Remove: alias, domain_of, from_schema, and pattern fields (keep structured_pattern... will require expansion in some future step)
  2. Add: see_also references to original URIs (MIXS:0000065, MIXS:0000066)
  3. Add: aliases field linking to original 16S term names
  4. Keep: All other fields (examples, structured_pattern, etc.)
  5. Format: Wrap in proper slots: block

Content Updates:

  1. Descriptions: Rewrite following MIxS issue #645 guidelines:

    • Use definitional voice (declarative fragments)
    • Follow Aristotelian form where practical
    • Avoid interrogative forms
    • Don't repeat term names
    • 10-40 words, no articles at beginning
  2. Examples: Add broader tool examples (barrnap, metaxa2) beyond just rambl (Claude's addition, not requested)

The goal was to maintain backward compatibility through aliases and see_also while expanding the terms' applicability beyond just 16S to all taxonomic marker genes (18S, ITS, COI, etc.).

turbomam avatar May 28 '25 13:05 turbomam

@jjkoehorst you expressed a concern about open-ended representations of the marker gene that was detected. I want to point out that the two terms initially addressed in this issue were

  • was a marker gene (like 16S SSU) detected?
  • by what software and in what configuration

So with the x16s_recover -> marker_gene_recov change in my PR, we don't have any place to say what the target/marker gene was. That might justify adding a new slot, possibly with a controlled vocabulary

turbomam avatar May 28 '25 15:05 turbomam

to recap, there are 2 parts here: (1) MIXS:0000065 "x16s_recover" and (2) MIXS:0000066 "x16s_recover_software", below I separate these into separate topics that can be converted to sub-issues for actioning:

  • [ ] deprecate MIXS:0000065 in favour of MIXS:0000044 The agreement is to deprecate MIXS:0000065 (x16s_recover) in favour of a new term with a more generic name. However, we already have the term MIXS:0000044 (target_gene) which is defined as "Targeted gene or locus name for marker gene studies" with the given examples: 16S rRNA, 18S rRNA, nif, amoA, rpo. To me that sounds EXACTLY what we're talking about here, therefore I suggest we just deprecate MIXS:0000065 with the notification that its replaced with MIXS:0000044

  • [ ] deprecate MIXS:0000066 in favour of new term (MIXS:0001337) The new term will have the following details: marker_gene_recov_sw description: Software tool(s) used for marker gene (e.g. 16S rRNA) extraction/recovery from sequence data title: marker gene recovery software examples: - value: rambl;v2;default parameters in_subset: - sequencing keywords: - recover - software slot_uri: MIXS:0001337 aliases: x16s_recover_software, 16s_recover_software, x16s recovery software range: string pattern: ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+);([^\s-]{1,2}|[^\s-]+.+[^\s-]+);([^\s-]{1,2}|[^\s-]+.+[^\s-]+)$ structured_pattern: syntax: ^{software};{version};{parameters}$ interpolated: true partial_match: true

only1chunts avatar Jun 19 '25 13:06 only1chunts

Both these changes make sense to me.

Woolly-at-EBI avatar Jun 19 '25 14:06 Woolly-at-EBI