SO-Ontologies icon indicating copy to clipboard operation
SO-Ontologies copied to clipboard

Delete children of 'C_D_box_snoRNA' and 'H_ACA_box_snoRNA'

Open sjm41 opened this issue 3 years ago • 7 comments

Following discussions amongst RNAcentral members (and specifically with Michelle Scott and Ruth Seal), we'd like to propose deleting the children of 'C_D_box_snoRNA' and 'H_ACA_box_snoRNA' (in the ncRNA, ncRNA_gene and primary_transcript branches of the SO) - their IDs could be added as secondary IDs to the current parent terms.

The reason is that these snoRNA subtypes correspond to either functional terms or discrete Rfam families, which both seem to be outside the scope of SO and SO annotation. Details below.

snoRNA terms defined based on function:

  • methylation_guide_snoRNA (SO:0005841)
  • methylation_guide_snoRNA_primary_transcript (SO:0000580)
  • methylation_guide_snoRNA_gene (SO:0002379)
  • pseudouridylation_guide_snoRNA (SO:0001187)
  • pseudouridylation_guide_snoRNA_gene (SO:0002380) => these refer purely to the function of snoRNAs, which seems to be outside the scope of the SO and better captured via GO annotation (e.g. GO:0030558 RNA pseudouridylation guide activity & GO:0030561 RNA 2'-O-ribose methylation guide activity). Also see #59.

snoRNA terms based on specific Rfam families:

  • U3_snoRNA (SO:0001179)
  • U3_snoRNA_gene (SO:0002378)
  • U14_snoRNA (SO:0000403)
  • U14_snoRNA_primary_transcript (SO:0005837)
  • U14_snoRNA_gene (SO:0002377) => these terms correspond to/are defined by specific Rfam families (U3_snoRNA = RF00012; U14_snoRNA = RF00016), which again seems to be outside the scope of the SO. That is, if there are SO terms specifically for the U3 and U14 families, then there could/should be similar SO terms for the many other snoRNA families too, which would be too much (and is better left to Rfam).

sjm41 avatar Jan 18 '22 15:01 sjm41

@keilbeck There are a few PMIDs attached to these terms, but they're all over 20 years old. I see you're listed as a reference for methylation_guide_snoRNA_primary_transcript (SO:0000580). Were these branches added for a particular group or use case? If so, I just want to do some digging before obsoleting anything.

egchristensen avatar Nov 02 '22 17:11 egchristensen

I believe RFAM asked for these terms. But it was a long time ago. If RFAM does not think it appropriate then we should go with the experts. Maybe we can find out if there is any usage of these terms and let people know the better way to annotate?

keilbeck avatar Nov 02 '22 19:11 keilbeck

@blakesweeney Does Rfam use/need these snoRNA SO terms:

U3_snoRNA (SO:0001179) U3_snoRNA_gene (SO:0002378) U14_snoRNA (SO:0000403) U14_snoRNA_primary_transcript (SO:0005837) U14_snoRNA_gene (SO:0002377)

Proposal in original post is to obsolete them as 'out of scope' of SO (similar reasoning to #546)

sjm41 avatar Nov 03 '22 07:11 sjm41

While Rfam uses the U3, U14 _snoRNA terms, I'd be fine with removing then for similar reasons as #546. We can switch to the correct snoRNA terms instead. I'd also like to say the terms mentioned in #546 would also be good candidates to remove.

blakesweeney avatar Nov 03 '22 09:11 blakesweeney

Maybe we can find out if there is any usage of these terms and let people know the better way to annotate?

I don't know of the best way of doing this, but the only relevant hits from a google search for each of the 10 SO IDs listed above are for sequenceontology.org or the SO GitHub page. So seems that none of these 10 terms are used much/if all. I think they can be safely obsoleted. Their IDs could be added as secondary IDs to their parent terms? Could also add a note to the obsoleted term explaining the reason for obsoletion, and pointing to the relevant Rfam or GO term.

sjm41 avatar Nov 08 '22 07:11 sjm41

I think we can go ahead and obsolete these terms, but leave a comment redirecting people to the proper annotation source. @sjm41, could I ask you to identify the right RFam/GO IDs or links to include as I obsolete these? The terms will remain in SO, but they will be disconnected from their parents and marked as obsolete with a reference to where they should go from now on.

egchristensen avatar Nov 14 '22 21:11 egchristensen

methylation_guide_snoRNA (SO:0005841) = GO:0030561 methylation_guide_snoRNA_primary_transcript (SO:0000580) = GO:0030561 methylation_guide_snoRNA_gene (SO:0002379) = GO:0030561

pseudouridylation_guide_snoRNA (SO:0001187) = GO:0030558 pseudouridylation_guide_snoRNA_gene (SO:0002380) = GO:0030558

U3_snoRNA (SO:0001179) = RF00012 U3_snoRNA_gene (SO:0002378) = n/a or RF00012

U14_snoRNA (SO:0000403) = RF00016 U14_snoRNA_primary_transcript (SO:0005837) = n/a or RF00016 U14_snoRNA_gene (SO:0002377) = n/a or RF00016

sjm41 avatar Nov 15 '22 16:11 sjm41