ontology-development-kit icon indicating copy to clipboard operation
ontology-development-kit copied to clipboard

Remove dependency to OWLTools in standard workflows?

Open gouttegd opened this issue 2 years ago • 16 comments

While most of the standard, ODK-generated workflows use ROBOT, there are still rules where OWLTOOLS is used:

$(SUBSETDIR)/%.owl: $(ONT).owl | $(SUBSETDIR)
        $(OWLTOOLS) $< --extract-ontology-subset --fill-gaps --subset $* -o [email protected] && mv [email protected] $@ &&\
        $(ROBOT) annotate --input $@ --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o [email protected] && mv [email protected] $@
normalize_obo_src: $(SRC)
        $(OWLTOOLS) $< --merge-axiom-annotations -o -f obo $(TMPDIR)/NORM.obo && $(ROBOT) convert -i $(TMPDIR)/NORM.obo -o $(TMPDIR)/NORM.tmp.obo && mv $(TMPDIR)/NORM.tmp.obo $(SRC)

Since we want to make it clear that ROBOT is the modern tool and that OWLTOOLS are no longer maintained, we should probably investigate how to replace the use of OWLTOOLS in those rules.

gouttegd avatar Jun 20 '22 14:06 gouttegd

In the ODK’s standard workflow, owltools is still used in two places.

Subset generation

Subset ontologies are produced with the owltools --extract-ontology-subset command:

$(SUBSETDIR)/%.owl: $(ONT).owl | $(SUBSETDIR)
        $(OWLTOOLS) $< --extract-ontology-subset --fill-gaps --subset $* -o [email protected] && mv [email protected] $@ &&\
        $(ROBOT) annotate --input $@ --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) -o [email protected] && mv [email protected] $@

What the --extract-ontology-subset command does mostly resides in the makeMinimalSubsetOntology method of owltools’ owltools.mooncat.Mooncat class, but basically it extracts a coherent, minimal subset ontology around the terms carrying a given oboInOwl:inSubset annotation.

It does not seem that this command can easily be replaced by other existing tools. I tried two approaches:

  • one involving several variations of robot filter --input $(ONT).owl --select "oboInOwl:inSubset=<subset to select>" --select "self annotations ancestors", but this systematically creates a subset that is much more “minimal” than the subset created by owltools --extract-ontology-subset;
  • a 2-step approach where I first get the list of terms in the subset (robot filter --input $(ONT).owl --select "oboInOwl:inSubset=<subset to select>" --select self --preserve-structure false --trim true export --header ID --export subset-terms.txt), then use the extract command to extract a module based on those terms, but this systematically creates a module that is much bigger than what is generated by owltools --extract-ontology-subset (no matter which extraction method is used).

Normalizing the source file

Owltools’ --merge-axiom-annotations command is used when normalizing a OBO-format source file:

normalize_obo_src: $(SRC)
	$(OWLTOOLS) $< --merge-axiom-annotations -o -f obo $(TMPDIR)/NORM.obo &&\
	$(ROBOT) convert -i $(TMPDIR)/NORM.obo -o $(TMPDIR)/NORM.tmp.obo &&\
	mv $(TMPDIR)/NORM.tmp.obo $(SRC)

The --merge-axiom-annotations command merges axioms that are logically equivalent while making sure that all the annotations on the original axioms are kept (see https://github.com/owlcollab/owltools/blob/master/OWLTools-Runner/src/main/java/owltools/cli/CommandRunner.java#L1349).

Again, it does not look like something that can easily be replicated by other tools, AFAIK.

The bottom line is that while owltools is now only used for 2 things (at least in the standard workflow; in Uberon’s very much non-standard workflow, it is also used to generate the composite-* products), for those things it appears irreplaceable, at least without a significant effort.

The question becomes, do we want to get rid of owltools enough to make that significant effort?

gouttegd avatar Oct 31 '22 18:10 gouttegd

@matentzn An opinion as to whether removing owltools from the standard workflow would be worth the effort that it would require?

gouttegd avatar Oct 31 '22 18:10 gouttegd

This robot PR is relevant to the subset issue: https://github.com/ontodev/robot/pull/1000 Not sure if will be an exact replacement of the owltools functionality.

balhoff avatar Oct 31 '22 18:10 balhoff

I didn’t know some efforts were already under way. Thanks, that’s good to know!

gouttegd avatar Oct 31 '22 18:10 gouttegd

I’ve tested ROBOT’s new extraction method (extract --method subset --term-file ...). It yields results that are similar to owltools --extract-ontology-subset, but only when the owltools command is called without the --fill-gaps option.

I have found no way of reproducing with ROBOT the same kind of subsets produced by owltools --extract-ontology-subset --fill-gaps.

gouttegd avatar Jun 12 '23 19:06 gouttegd

The --fill-gaps option is very crucial for this command - this was what the whole business with the ROBOT subset command was all about; can you characterise how the two approaches appear to differ?

matentzn avatar Jun 13 '23 08:06 matentzn

Sorry, no. The approach used by OWLTools is implemented here: https://github.com/owlcollab/owltools/blob/9faa4f42b761839a26e8c8096cd24044e2bdcfc7/OWLTools-Core/src/main/java/owltools/mooncat/Mooncat.java#L832

If you believe I can tell how it differs from the approach used in ROBOT (https://github.com/ontodev/robot/blob/2345420d04ab29b1d7087f22e3a666295ece6002/robot-core/src/main/java/org/obolibrary/robot/ExtractOperation.java#L235), or whether it boils down to the same algorithm as proposed by Chris Mungall (https://github.com/ontodev/robot/issues/497#issuecomment-975873714), well, I appreciate your confidence in my understanding of graph theory, but I’m afraid that confidence is severely misplaced.

gouttegd avatar Jun 13 '23 10:06 gouttegd

A few observations, though.

Here is how I use the “extract subset” method, based on how I understood it was supposed to be used (I did my tests with the BDS_subset of CL):

$ robot filter --input cl.owl --prefix 'cl: http://purl.obolibrary.org/obo/cl#' --select 'oboInOwl:inSubset=cl:BDS_subset' export --header ID --export bds_subset_terms.txt
$ robot extract --input cl.owl --method subset --term-file bds_subset_terms.txt --output bds_subset.owl

The first command merely extracts a list of all the terms annotated as being in the subset, while the second does the actual subset extraction.

But this command produces almost exactly the same subset as the following simple filter command:

$ robot filter --input cl.owl --prefix 'cl: http://purl.obolibrary.org/obo/cl#' --select annotations --select 'oboInOwl:inSubset=cl:BDS_subset' -o bds_subset.owl

So either

  • I didn’t understand at all how extract --method subset was supposed to be used, or
  • I do not understand what the method is supposed to yield (very likely: I have yet to meet two people who agree on what a “subset” actually is), or
  • that command does not do what it was supposed to do.

Of note, according to Chris Mungall the subset command was supposed to be merely a shorthand for

filter --preserve-structure true --use-all-relations true --select annotations -select "oboInOwl:inSubset=subset:$SUBSET"

except that, unless again I am missing something, there is no such option as --use-all-relations. And I note that the “subsets” generated by the two sets of commands above have in common that they contain absolutely no relations at all (no object properties), which I suspect might be an important clue (to me it looks like the subset extraction method is actually ignoring relations when it does its trick).

gouttegd avatar Jun 13 '23 10:06 gouttegd

Even if I forcefully include the relationships I want in the subset (ROBOT’s documentation for extract seems to suggests this is needed, i.e. ROBOT will not automatically include the relations), the extracted subset will then contain the definitions of the object properties but they will not be used at all (none of the classes in the extracted subset will have any relations).

gouttegd avatar Jun 13 '23 11:06 gouttegd

I appreciate your confidence in my understanding of graph theory, but I’m afraid that confidence is severely misplaced.

Hahha sorry, I should have been more clear. While I do have the confidence that you could with a bit of time characterise the algorithms, what I really meant to say is "describe the difference in the output at a high level", i.e. the one has 1000 less is a relations than the other and 10K more part of, or some such, which is what you proceeded to do afterwards! Thank you!

I am not concerned I think the subset stuff we have in ROBOT now is superior to OWLTools and we should just retire it, and see who screams.

matentzn avatar Jun 13 '23 11:06 matentzn

what I really meant to say is "describe the difference in the output at a high level", i.e. the one has 1000 less is a relations than the other and 10K more part of, or some such

Well, you can have a look for yourself.

Here is the subset generated by owltools --extract-ontology-subset: bds_subset_owltools_nofillgaps.owl.txt

It contains precisely the 65 classes defined in the subset. It also contains almost all the object properties from the original ontology, but they are not used (none of the 65 classes in the subset has any relation to anything).

Here is the subset generated by owltools --extract-ontology-subset --fill-gaps: bds_subset_owltools_fillgaps.owl.txt

It contains 669 classes (including the 65 from the subset itself). It contains the same object properties than the previous one, but here they are used. All 669 classes have their full set of relations.

Here is the subset generated by robot extract --method subset --term-file subset.txt (where subset.txt contains the list of terms defined in the subset, obtained by a previous filter --select 'oboInOwl:inSubset=cl:BDS_subset' command): bds_subset_robot_extract_subset.owl.txt

It contains only the 65 classes of the subset itself. They have no relations (the object properties themselves are absent from the subset).

If I explicitly add the relations to the term-file argument (which I believe is necessary because of the change discussed here), this is the generated subset: bds_subset_robot_extract_subset_with_relations.owl.txt

It still only contains only the 65 classes of the subset itself. The object properties are present in the output, but they are not used.

gouttegd avatar Jun 13 '23 12:06 gouttegd

Incidentally, the ROBOT version is horrendously slower than the OWLTOOLS version: on CL, robot extract --method subset takes more than 3 minutes (~195 seconds) while owltools --extract-ontology-subset takes ~15 seconds.

gouttegd avatar Jun 13 '23 12:06 gouttegd

I think the subset stuff we have in ROBOT now is superior to OWLTools and we should just retire it

What happened to “the --fill-gaps option is very crucial for this command”? I found no way of doing any kind of “gap filling” with ROBOT. As you can see in the examples above, the subset extracted by ROBOT always only contains the very terms marked with the inSubset annotation, and nothing more.

If there is a way to do with ROBOT what is done with owltools --extract-ontology-subset --fill-gaps, I would very much like to know it – that’s kind of what this entire ticket is about!

gouttegd avatar Jun 13 '23 12:06 gouttegd

I wonder if there has been a misunderstanding between “gap filling” and “gap spanning”. All the discussion in the ROBOT ticket about the requested new subset command seems to be about “gap spanning” (ensuring relations are preserved in the subset, even if they are “indirect” relations that involve some intermediates classes that are not in the subset).

The “gap filling” done by owltools --extract-ontology-subset --fill-gaps is about including intermediate classes in the subset, something that seemingly has never been proposed as a goal of ROBOT’s new subset command.

gouttegd avatar Jun 13 '23 12:06 gouttegd

Incidentally, the ROBOT version is horrendously slower than the OWLTOOLS version: on CL, robot extract --method subset takes more than 3 minutes (~195 seconds) while owltools --extract-ontology-subset takes ~15 seconds.

It's a completely different algorithm; it's running relation-graph internally which is pretty intensive (but logically complete).

balhoff avatar Jun 13 '23 13:06 balhoff

I made a separate issue for the gap-filling (include intermediates) option:

  • https://github.com/ontodev/robot/issues/1128

If this is implemented AND we are satisfied with efficiency THEN I believe we can remove owltools

Regarding efficiency, it wasn't clear to me whether comments about robot subset was using RG with a property subset or all properties. Note that even if this is addressed, there is still room for a more efficient operation that uses HOP over ENTAILMENT. See OAK docs for an explanation of this: https://incatools.github.io/ontology-access-kit/guide/relationships-and-graphs.html#graph-traversal-strategies

cmungall avatar Jul 07 '23 00:07 cmungall