owltools icon indicating copy to clipboard operation
owltools copied to clipboard

Terms are not completely removed by --make-species-subset

Open kimrutherford opened this issue 9 years ago • 7 comments

I'd like to get an OBO file of GO terms that make sense for pombe. I've tried:

owltools go-plus.obo --reasoner elk --make-species-subset -t NCBITaxon:4896 -o -f obo go-pombe-subset.obo

That nearly does what I need but the output OBO file has some stanzas that I didn't expect.

As an example, in go-plus.obo, GO:0000795 ("synaptonemal complex") has this relationship:

relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0010058"} ! Schizosaccharomyces pombe

so I was expecting that GO:0000795 wouldn't appear in the output. Instead it was there but with just "id:" and "alt_id:" lines:

[Term] id: GO:0000795 alt_id: GO:0005716

CC @ValWood @mah11

kimrutherford avatar Sep 26 '16 02:09 kimrutherford

Interesting. This is obviously not something you want, but it's a predictable result of how things are specified.

Currently, the species subsetter will not try to remove any deprecated classes (deprecated classes have no logical axioms to allow filtering). One category of deprecated classes are classes that have been merged into another class (alt_id).

Probably the most sensible thing here is a cascading delete rule. The alt_id axioms are removed if they point to a class that has been filtered out. If a deprecated class is left with no axioms then it itself is removed.

Perhaps even more straightforward would be a removal of all deprecated classes prior to subsetting, but that may be too extreme for you?

cmungall avatar Sep 26 '16 17:09 cmungall

Perhaps even more straightforward would be a removal of all deprecated classes prior to subsetting, but that may be too extreme for you?

If I'm following, that would mean removing all alt_ids? If so, yes that would cause us problems.

kimrutherford avatar Sep 28 '16 05:09 kimrutherford

I don't know whether the Jenkins check follows part_of (or the regulates relations, etc.), or only is_a.

mah11 avatar Sep 28 '16 09:09 mah11

Presumably the taxon check should follow all these relations?

ValWood avatar Sep 28 '16 10:09 ValWood

Some of them are removed by default:

"evolved_from",
"homologous_to",
"evolved_multiple_times_in",
//"shares ancestor with"
"RO:0002158"

fbastian avatar Sep 28 '16 18:09 fbastian

Thanks @fbastian

One thing I forgot to mention is that there are two approaches for doing these kinds of things. --make-species-subset uses a method that is highly eliminative. Essentially:

  • Assume your taxon is the only one in the world (literally: we assert an axiom that states this)
  • Remove any axioms that involve cross-species dependence (@fbastian's list)
  • Use reasoner to find unsatisfiable classes - which include any that have any existential dependence, direct or indirect on an unsatisfiable class

TL;DR - this is what you what you want for making an S pombe subset. It's also what we use for uberon subsets. But you would never use this for a pathogenic yeast or anything where you have symbiotic gene associations (ie regulates relations crossing into processes that can't be carried out natively in that organism).

The jenkins check uses a less eliminative / more conservative rule that doesn't state the solipsistic axiom. In this case classes are only eliminated if they are inferred via property chain axioms for in taxon which can be seen here http://purl.obolibrary.org/obo/RO_0002162

This means the jenkins check won't flag pathogenic regulation as a problem, but it is more liberal when it comes to your non-pathogenic yeast.

None of this has anything to do with the alt_id q, btw

cmungall avatar Sep 28 '16 18:09 cmungall

If I'm following, that would mean removing all alt_ids? If so, yes that would cause us problems.

OK, then we have to go with the cascading delete solution.

cmungall avatar Sep 28 '16 18:09 cmungall