robot icon indicating copy to clipboard operation
robot copied to clipboard

Question about use of robot extract

Open rpgoldman opened this issue 4 years ago • 12 comments

The docs state:

The reuse of ontology terms creates links between data, making the ontology and the data more valuable. But often you want to reuse just a subset of terms from a target ontology, not the whole thing.

I want to pull a number of terms, including individuals from an ontology (OM-2). For example, I would like to get enough of the ontology to include definitions for the om:microlitre, om:millimetre, om:litre and om:metre individuals.

It's not clear how this use case interacts with the --individuals option, which seems to implicitly assume that only classes matter in choosing the subset of the ontology:

Important note for ontologies that include individuals: When using the SLME method of extraction, all individuals (ABox axioms) and their class types (the TBox axioms they depend on) are included by default. The extract command provides an --individuals option to specify what (if any) individuals are included in the output ontology....

There doesn't seem to be an option to carry all the input terms that are individuals, together with required definitional concepts, into the output ontology.

If somebody can clue me in about this, I can try to make a PR with a doc fix....

rpgoldman avatar Oct 19 '21 20:10 rpgoldman

@rpgoldman it is true that you cannot use BOT, TOP, STAR (all SLME variants) when dealing with anything involving individuals.

I would suggest you do the following (sounds complicated, but in the end its not thaaaat complex):

  1. Use ROBOT remove to delete all individuals from the ontology except the ones you care about (robot remove ... -T signature.txt --select complement --select individuals)
  2. The use extract -T signature.txt

You can put them both in the same pipe. signature.txt is a text file that contains the elements you are looking to extract.

matentzn avatar Oct 20 '21 10:10 matentzn

@matentzn Am I correct in thinking that any terms in the term list will be included, despite the value of the --individuals option to robot extract?

Is it also true that whatever entities they require according to the rules of BOT, TOP, or STAR will also be included?

If so, I will make a candidate PR for the extract docs to try to clarify this.

rpgoldman avatar Oct 20 '21 16:10 rpgoldman

The purpose of the --individuals parameter is simply to allow people to ignore individuals before extracting the module.. So if you include an individual in the term list, then use --individuals exclude, you will not get the individuals back. Hence my suggestion to use robot remove to first remove all irrelevant individuals, and then use extract without the individuals option..

matentzn avatar Oct 20 '21 19:10 matentzn

The purpose of the --individuals parameter is simply to allow people to ignore individuals before extracting the module.. So if you include an individual in the term list, then use --individuals exclude, you will not get the individuals back. Hence my suggestion to use robot remove to first remove all irrelevant individuals, and then use extract without the individuals option..

I had been using --individuals minimal after removing the irrelevant individuals and that seemed to work. But you are suggesting just leaving it out altogether, correct?

rpgoldman avatar Oct 20 '21 19:10 rpgoldman

@matentzn -- I don't think I have a good mental model of how extract is working....

Here are two files, first with --individuals minimal and the second with no --individuals flag at all. Here's the diff that shows that in the latter the only change is the addition of more facts about the hasFactor DataProperty, and that pulls in one more class definition (to cover the domain of hasFactor):

$ diff om-subset.omn om-subset2.omn
87a88,100
>     Annotations:
>         rdfs:label "has factor"@en,
>         rdfs:label "因子を持つ"@ja
>
>     Characteristics:
>         Functional
>
>     Domain:
>         <http://www.ontology-of-units-of-measure.org/resource/om-2/Prefix> or <http://www.ontology-of-units-of-measure.org/resource/om-2/Scale> or <http://www.ontology-of-units-of-measure.org/resource/om-2/SingularUnit> or <http://www.ontology-of-units-of-measure.org/resource/om-2/UnitMultiple>
>
>     Range:
>         xsd:float
>
164a178,184
>
>
> Class: <http://www.ontology-of-units-of-measure.org/resource/om-2/Prefix>
>
>     Annotations:
>         rdfs:comment "A prefix is a name that precedes a basic unit of measure to indicate a decimal or binary multiple or fraction of the unit. Each prefix has a unique symbol that is prepended to the unit symbol. For example, an electric current of 0.000 000 001 ampere is written by using the SI-prefix nano as 1 nanoampere or 1 nA."@en,
>         rdfs:label "prefix"@en

I don't get how the change in the individuals option accounts for this change in the results. I'm attaching both results files here.

om-subset.omn.txt om-subset2.omn.txt

rpgoldman avatar Oct 20 '21 20:10 rpgoldman

Hmm, I can't explain that particular change. Looks like a bug? The extract method is based on some formal logic approach that just does not deal with individuals. For SLME, individuals are considered global, so all individuals are always included in all modules no matter what you put in the term file.

The --individuals option is a hack in ROBOT to work with SLME, not a formal procedure for module extraction. But the only thing it really does is help avoiding individuals not really help dealing with them. So unfortunately the only thing I can say is: extract with BOT, STAR, TOP should not be used with ontologies with individuals at all.

Sorry if this is not clear - its a bit of a difficult subject.

matentzn avatar Oct 20 '21 20:10 matentzn

@matentzn That was very clear, thank you!

Does this make sense as an approach?

Let S be the set of IRIs, including individuals.

  1. Classify all s ∈ S where s is an individual
  2. Add to S all direct superclasses of s | s ∈ S ∧ s an Individual
  3. Remove from the ontology all the individuals * ∉ S* with robot remove per your earlier comment.
  4. Then use robot extract -T with the augmented S

One thing that I am not sure about is the way to find all of the direct superclasses of each individual. Would materialize do that? Actually, it looks like the documentation page for matrialize suggests a different recipe to do something very similar: http://robot.obolibrary.org/materialize

Actually, I'm also not clear about how to extract the types produced by materialize in step 1, and I think it would probably be a good thing to remove extraneous individuals before Step 1....

rpgoldman avatar Oct 20 '21 22:10 rpgoldman

@matentzn

1. Use ROBOT remove to delete all individuals from the ontology except the ones you  care about (`robot remove ... -T signature.txt --select complement --select individuals`)

Doesn't the above give me only the individuals in the ontology -- I mean with all the rest of the ontology stripped out?

When I did

robot remove -vv --input om-2.0.omn --select individuals --select complement --term-file absolute-om-symbols.txt --output selected.omn

I got a file with only the ontology header, some annotation property definitions used in the header, and the individuals in absolute-om-symbols.txt. Everything else was gone: all the class definitions, typing axioms, etc., including even the definitions of classes named in the term file.

Is this an alternative?

robot filter -vv --input om-2.0.omn --select "ontology individuals ancestors annotations properties types domains ranges equivalents imports"  --term-file absolute-om-symbols.txt --axioms "all" --output filtered.omn

I'm not sure, because I suppose this doesn't contain materialized facts about the selected terms (i.e., they are not classified).

Trying to chain materialize into filter now....

rpgoldman avatar Oct 20 '21 22:10 rpgoldman

Try something like this:

robot remove -vv --input om-2.0.omn --term-file absolute-om-symbols.txt  --select complement --select individuals --output selected.omn

I think this does this:

  1. --term-file/T ensures that we select all the terms we want
  2. --select complement selects all other terms, i.e. the ones not in our list
  3. --select individuals selects all the individuals of the ones we don't want

After that, we should be able to run the extract -M BOT (even without the --individuals option).

And then try the reason/materialse commands again.. Maybe that works?

matentzn avatar Oct 21 '21 09:10 matentzn

Shouldn't I do the materialize before the extraction, in order to avoid having inferred superclasses of an individual be stripped out before the individual is classified?

Also, do I want to do remove with --select complement --select individuals or --select "complement individuals"?

rpgoldman avatar Oct 21 '21 15:10 rpgoldman

not sure what the difference is, but I use the first :)

matentzn avatar Oct 21 '21 16:10 matentzn

http://robot.obolibrary.org/remove#selectors

--select complement --select individuals executes in series:

  1. start with target set A
  2. get the complement of target set A to generate target set B
  3. get the individuals from target set B to generate target set C
  4. output target set C

--select "complement individuals" executes in parallel and takes the union

  1. start with target set A
  2. get the complement of target set A to generate target set B
  3. get the individuals from target set A to generate target set C
  4. output the union of target set B and target set C

jamesaoverton avatar Oct 21 '21 16:10 jamesaoverton