owltools
                                
                                 owltools copied to clipboard
                                
                                    owltools copied to clipboard
                            
                            
                            
                        Terms are not completely removed by --make-species-subset
I'd like to get an OBO file of GO terms that make sense for pombe. I've tried:
owltools go-plus.obo --reasoner elk --make-species-subset -t NCBITaxon:4896 -o -f obo go-pombe-subset.obo
That nearly does what I need but the output OBO file has some stanzas that I didn't expect.
As an example, in go-plus.obo, GO:0000795 ("synaptonemal complex") has this relationship:
relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0010058"} ! Schizosaccharomyces pombe
so I was expecting that GO:0000795 wouldn't appear in the output. Instead it was there but with just "id:" and "alt_id:" lines:
[Term] id: GO:0000795 alt_id: GO:0005716
CC @ValWood @mah11
Interesting. This is obviously not something you want, but it's a predictable result of how things are specified.
Currently, the species subsetter will not try to remove any deprecated classes (deprecated classes have no logical axioms to allow filtering). One category of deprecated classes are classes that have been merged into another class (alt_id).
Probably the most sensible thing here is a cascading delete rule. The alt_id axioms are removed if they point to a class that has been filtered out. If a deprecated class is left with no axioms then it itself is removed.
Perhaps even more straightforward would be a removal of all deprecated classes prior to subsetting, but that may be too extreme for you?
Perhaps even more straightforward would be a removal of all deprecated classes prior to subsetting, but that may be too extreme for you?
If I'm following, that would mean removing all alt_ids? If so, yes that would cause us problems.
I don't know whether the Jenkins check follows part_of (or the regulates relations, etc.), or only is_a.
Presumably the taxon check should follow all these relations?
Some of them are removed by default:
"evolved_from",
"homologous_to",
"evolved_multiple_times_in",
//"shares ancestor with"
"RO:0002158"
Thanks @fbastian
One thing I forgot to mention is that there are two approaches for doing these kinds of things. --make-species-subset uses a method that is highly eliminative. Essentially:
- Assume your taxon is the only one in the world (literally: we assert an axiom that states this)
- Remove any axioms that involve cross-species dependence (@fbastian's list)
- Use reasoner to find unsatisfiable classes - which include any that have any existential dependence, direct or indirect on an unsatisfiable class
TL;DR - this is what you what you want for making an S pombe subset. It's also what we use for uberon subsets. But you would never use this for a pathogenic yeast or anything where you have symbiotic gene associations (ie regulates relations crossing into processes that can't be carried out natively in that organism).
The jenkins check uses a less eliminative / more conservative rule that doesn't state the solipsistic axiom. In this case classes are only eliminated if they are inferred via property chain axioms for in taxon which can be seen here http://purl.obolibrary.org/obo/RO_0002162
This means the jenkins check won't flag pathogenic regulation as a problem, but it is more liberal when it comes to your non-pathogenic yeast.
None of this has anything to do with the alt_id q, btw
If I'm following, that would mean removing all alt_ids? If so, yes that would cause us problems.
OK, then we have to go with the cascading delete solution.