robot icon indicating copy to clipboard operation
robot copied to clipboard

Add gap-fill option to subset method in extract

Open cmungall opened this issue 2 years ago • 3 comments
trafficstars

I am creating this issue as a continuation of @gouttegd's comment here:

  • https://github.com/INCATools/ontology-development-kit/issues/622#issuecomment-1589252639

robot extract has a subset method described here, which is most useful. In the terminology of owltools, this method implements "gap spanning". owltools had an additional option when making subsets, "gap filling". This would add all intermediate nodes and edges between terms in the subset.

this can be illustrated using the test subset ontology

let's make a subset:

cat > SUBSET
ONT:1
ONT:4

we can then look at the difference between the two commands:

owltools  ./docs/examples/subset.obo --extract-ontology-subset -i SUBSET --fill-gaps -o -f obo /tmp/gap-filled.obo
owltools  ./docs/examples/subset.obo --extract-ontology-subset -i SUBSET  -o -f obo /tmp/gap-spanned.obo

gap-spanned.obo:

format-version: 1.2
subsetdef: foo "foo"
ontology: test-subset

[Term]
id: ONT:1
subset: foo

[Term]
id: ONT:4
relationship: part_of ONT:1

[Typedef]
id: overlaps
xref: RO:0002131

[Typedef]
id: part_of
xref: BFO:0000050
is_transitive: true

note that the part-of between 4 and 1 is not asserted, it is entailed

gap-filled:

format-version: 1.2
subsetdef: foo "foo"
ontology: test-subset

[Term]
id: ONT:1
subset: foo

[Term]
id: ONT:2
relationship: part_of ONT:1

[Term]
id: ONT:3
relationship: part_of ONT:2

[Term]
id: ONT:4
relationship: part_of ONT:3

[Typedef]
id: overlaps
xref: RO:0002131

[Typedef]
id: part_of
xref: BFO:0000050
is_transitive: true

Proposal:

either a

  • new sibling --method option called something like intermediate-filled-subset
  • new option on extract that is only applicable for subset called something like --include-intermediate-nodes

(we should name this carefully, the owltools terminology of gap filling/spanning is not great)

unlike the default subset approach which connects subset terms via entailed edges, this would traverse all intermediate nodes via direct edges.

Algorithm:

SubsetWithIntermediates(O,S,P):
  S' = S
  for each t in O-S:
    if there exists t1, t2 in S such that <t1 P t>, <t P t2>:
       add t to S'
  O' = {}
  for e in RGdirect(O):
    if e.s in S' and s.o in S' then add e to O`

Here <s P o> means that there exists a relation graph direct or indirect edge <s p o> such that p is in P or p is rdfs:subClassOf

cmungall avatar Jul 07 '23 00:07 cmungall

This would be useful (as would a general specification of an algo for graph traversal using the UberGraph redundant graph).

dosumis avatar Jul 07 '23 09:07 dosumis

Of your two options, this is clearer to me:

  • new option on extract that is only applicable for subset called something like --include-intermediate-nodes

dosumis avatar Jul 07 '23 09:07 dosumis

I feel very uncomfortable committing to implementing this.. @dosumis could you we put this on @hkir-dev plate perhaps? I have just assigned 12 robot issues to myself, wanting to do a push.

matentzn avatar Aug 05 '23 17:08 matentzn