ontology-development-kit
ontology-development-kit copied to clipboard
Remove dependency to OWLTools in standard workflows?
While most of the standard, ODK-generated workflows use ROBOT, there are still rules where OWLTOOLS is used:
$(SUBSETDIR)/%.owl: $(ONT).owl | $(SUBSETDIR)
$(OWLTOOLS) $< --extract-ontology-subset --fill-gaps --subset $* -o [email protected] && mv [email protected] $@ &&\
$(ROBOT) annotate --input $@ --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o [email protected] && mv [email protected] $@
normalize_obo_src: $(SRC)
$(OWLTOOLS) $< --merge-axiom-annotations -o -f obo $(TMPDIR)/NORM.obo && $(ROBOT) convert -i $(TMPDIR)/NORM.obo -o $(TMPDIR)/NORM.tmp.obo && mv $(TMPDIR)/NORM.tmp.obo $(SRC)
Since we want to make it clear that ROBOT is the modern tool and that OWLTOOLS are no longer maintained, we should probably investigate how to replace the use of OWLTOOLS in those rules.
In the ODK’s standard workflow, owltools
is still used in two places.
Subset generation
Subset ontologies are produced with the owltools --extract-ontology-subset
command:
$(SUBSETDIR)/%.owl: $(ONT).owl | $(SUBSETDIR)
$(OWLTOOLS) $< --extract-ontology-subset --fill-gaps --subset $* -o [email protected] && mv [email protected] $@ &&\
$(ROBOT) annotate --input $@ --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o [email protected] && mv [email protected] $@
What the --extract-ontology-subset
command does mostly resides in the makeMinimalSubsetOntology
method of owltools’ owltools.mooncat.Mooncat class, but basically it extracts a coherent, minimal subset ontology around the terms carrying a given oboInOwl:inSubset
annotation.
It does not seem that this command can easily be replaced by other existing tools. I tried two approaches:
- one involving several variations of
robot filter --input $(ONT).owl --select "oboInOwl:inSubset=<subset to select>" --select "self annotations ancestors"
, but this systematically creates a subset that is much more “minimal” than the subset created byowltools --extract-ontology-subset
; - a 2-step approach where I first get the list of terms in the subset (
robot filter --input $(ONT).owl --select "oboInOwl:inSubset=<subset to select>" --select self --preserve-structure false --trim true export --header ID --export subset-terms.txt
), then use theextract
command to extract a module based on those terms, but this systematically creates a module that is much bigger than what is generated byowltools --extract-ontology-subset
(no matter which extraction method is used).
Normalizing the source file
Owltools’ --merge-axiom-annotations
command is used when normalizing a OBO-format source file:
normalize_obo_src: $(SRC)
$(OWLTOOLS) $< --merge-axiom-annotations -o -f obo $(TMPDIR)/NORM.obo &&\
$(ROBOT) convert -i $(TMPDIR)/NORM.obo -o $(TMPDIR)/NORM.tmp.obo &&\
mv $(TMPDIR)/NORM.tmp.obo $(SRC)
The --merge-axiom-annotations
command merges axioms that are logically equivalent while making sure that all the annotations on the original axioms are kept (see https://github.com/owlcollab/owltools/blob/master/OWLTools-Runner/src/main/java/owltools/cli/CommandRunner.java#L1349).
Again, it does not look like something that can easily be replicated by other tools, AFAIK.
The bottom line is that while owltools
is now only used for 2 things (at least in the standard workflow; in Uberon’s very much non-standard workflow, it is also used to generate the composite-*
products), for those things it appears irreplaceable, at least without a significant effort.
The question becomes, do we want to get rid of owltools
enough to make that significant effort?
@matentzn An opinion as to whether removing owltools
from the standard workflow would be worth the effort that it would require?
This robot PR is relevant to the subset issue: https://github.com/ontodev/robot/pull/1000 Not sure if will be an exact replacement of the owltools functionality.
I didn’t know some efforts were already under way. Thanks, that’s good to know!
I’ve tested ROBOT’s new extraction method (extract --method subset --term-file ...
). It yields results that are similar to owltools --extract-ontology-subset
, but only when the owltools command is called without the --fill-gaps
option.
I have found no way of reproducing with ROBOT the same kind of subsets produced by owltools --extract-ontology-subset --fill-gaps
.
The --fill-gaps option is very crucial for this command - this was what the whole business with the ROBOT subset command was all about; can you characterise how the two approaches appear to differ?
Sorry, no. The approach used by OWLTools is implemented here: https://github.com/owlcollab/owltools/blob/9faa4f42b761839a26e8c8096cd24044e2bdcfc7/OWLTools-Core/src/main/java/owltools/mooncat/Mooncat.java#L832
If you believe I can tell how it differs from the approach used in ROBOT (https://github.com/ontodev/robot/blob/2345420d04ab29b1d7087f22e3a666295ece6002/robot-core/src/main/java/org/obolibrary/robot/ExtractOperation.java#L235), or whether it boils down to the same algorithm as proposed by Chris Mungall (https://github.com/ontodev/robot/issues/497#issuecomment-975873714), well, I appreciate your confidence in my understanding of graph theory, but I’m afraid that confidence is severely misplaced.
A few observations, though.
Here is how I use the “extract subset” method, based on how I understood it was supposed to be used (I did my tests with the BDS_subset
of CL):
$ robot filter --input cl.owl --prefix 'cl: http://purl.obolibrary.org/obo/cl#' --select 'oboInOwl:inSubset=cl:BDS_subset' export --header ID --export bds_subset_terms.txt
$ robot extract --input cl.owl --method subset --term-file bds_subset_terms.txt --output bds_subset.owl
The first command merely extracts a list of all the terms annotated as being in the subset, while the second does the actual subset extraction.
But this command produces almost exactly the same subset as the following simple filter command:
$ robot filter --input cl.owl --prefix 'cl: http://purl.obolibrary.org/obo/cl#' --select annotations --select 'oboInOwl:inSubset=cl:BDS_subset' -o bds_subset.owl
So either
- I didn’t understand at all how
extract --method subset
was supposed to be used, or - I do not understand what the method is supposed to yield (very likely: I have yet to meet two people who agree on what a “subset” actually is), or
- that command does not do what it was supposed to do.
Of note, according to Chris Mungall the subset command was supposed to be merely a shorthand for
filter --preserve-structure true --use-all-relations true --select annotations -select "oboInOwl:inSubset=subset:$SUBSET"
except that, unless again I am missing something, there is no such option as --use-all-relations
. And I note that the “subsets” generated by the two sets of commands above have in common that they contain absolutely no relations at all (no object properties), which I suspect might be an important clue (to me it looks like the subset extraction method is actually ignoring relations when it does its trick).
Even if I forcefully include the relationships I want in the subset (ROBOT’s documentation for extract
seems to suggests this is needed, i.e. ROBOT will not automatically include the relations), the extracted subset will then contain the definitions of the object properties but they will not be used at all (none of the classes in the extracted subset will have any relations).
I appreciate your confidence in my understanding of graph theory, but I’m afraid that confidence is severely misplaced.
Hahha sorry, I should have been more clear. While I do have the confidence that you could with a bit of time characterise the algorithms, what I really meant to say is "describe the difference in the output at a high level", i.e. the one has 1000 less is a relations than the other and 10K more part of, or some such, which is what you proceeded to do afterwards! Thank you!
I am not concerned I think the subset stuff we have in ROBOT now is superior to OWLTools and we should just retire it, and see who screams.
what I really meant to say is "describe the difference in the output at a high level", i.e. the one has 1000 less is a relations than the other and 10K more part of, or some such
Well, you can have a look for yourself.
Here is the subset generated by owltools --extract-ontology-subset
:
bds_subset_owltools_nofillgaps.owl.txt
It contains precisely the 65 classes defined in the subset. It also contains almost all the object properties from the original ontology, but they are not used (none of the 65 classes in the subset has any relation to anything).
Here is the subset generated by owltools --extract-ontology-subset --fill-gaps
:
bds_subset_owltools_fillgaps.owl.txt
It contains 669 classes (including the 65 from the subset itself). It contains the same object properties than the previous one, but here they are used. All 669 classes have their full set of relations.
Here is the subset generated by robot extract --method subset --term-file subset.txt
(where subset.txt
contains the list of terms defined in the subset, obtained by a previous filter --select 'oboInOwl:inSubset=cl:BDS_subset'
command):
bds_subset_robot_extract_subset.owl.txt
It contains only the 65 classes of the subset itself. They have no relations (the object properties themselves are absent from the subset).
If I explicitly add the relations to the term-file
argument (which I believe is necessary because of the change discussed here), this is the generated subset:
bds_subset_robot_extract_subset_with_relations.owl.txt
It still only contains only the 65 classes of the subset itself. The object properties are present in the output, but they are not used.
Incidentally, the ROBOT version is horrendously slower than the OWLTOOLS version: on CL, robot extract --method subset
takes more than 3 minutes (~195 seconds) while owltools --extract-ontology-subset
takes ~15 seconds.
I think the subset stuff we have in ROBOT now is superior to OWLTools and we should just retire it
What happened to “the --fill-gaps option is very crucial for this command”? I found no way of doing any kind of “gap filling” with ROBOT. As you can see in the examples above, the subset extracted by ROBOT always only contains the very terms marked with the inSubset
annotation, and nothing more.
If there is a way to do with ROBOT what is done with owltools --extract-ontology-subset --fill-gaps
, I would very much like to know it – that’s kind of what this entire ticket is about!
I wonder if there has been a misunderstanding between “gap filling” and “gap spanning”. All the discussion in the ROBOT ticket about the requested new subset
command seems to be about “gap spanning” (ensuring relations are preserved in the subset, even if they are “indirect” relations that involve some intermediates classes that are not in the subset).
The “gap filling” done by owltools --extract-ontology-subset --fill-gaps
is about including intermediate classes in the subset, something that seemingly has never been proposed as a goal of ROBOT’s new subset
command.
Incidentally, the ROBOT version is horrendously slower than the OWLTOOLS version: on CL,
robot extract --method subset
takes more than 3 minutes (~195 seconds) whileowltools --extract-ontology-subset
takes ~15 seconds.
It's a completely different algorithm; it's running relation-graph internally which is pretty intensive (but logically complete).
I made a separate issue for the gap-filling (include intermediates) option:
- https://github.com/ontodev/robot/issues/1128
If this is implemented AND we are satisfied with efficiency THEN I believe we can remove owltools
Regarding efficiency, it wasn't clear to me whether comments about robot subset was using RG with a property subset or all properties. Note that even if this is addressed, there is still room for a more efficient operation that uses HOP over ENTAILMENT. See OAK docs for an explanation of this: https://incatools.github.io/ontology-access-kit/guide/relationships-and-graphs.html#graph-traversal-strategies