robot
robot copied to clipboard
Add gap-fill option to subset method in extract
I am creating this issue as a continuation of @gouttegd's comment here:
- https://github.com/INCATools/ontology-development-kit/issues/622#issuecomment-1589252639
robot extract has a subset method described here, which is most useful. In the terminology of owltools, this method implements "gap spanning". owltools had an additional option when making subsets, "gap filling". This would add all intermediate nodes and edges between terms in the subset.
this can be illustrated using the test subset ontology
let's make a subset:
cat > SUBSET
ONT:1
ONT:4
we can then look at the difference between the two commands:
owltools ./docs/examples/subset.obo --extract-ontology-subset -i SUBSET --fill-gaps -o -f obo /tmp/gap-filled.obo
owltools ./docs/examples/subset.obo --extract-ontology-subset -i SUBSET -o -f obo /tmp/gap-spanned.obo
gap-spanned.obo:
format-version: 1.2
subsetdef: foo "foo"
ontology: test-subset
[Term]
id: ONT:1
subset: foo
[Term]
id: ONT:4
relationship: part_of ONT:1
[Typedef]
id: overlaps
xref: RO:0002131
[Typedef]
id: part_of
xref: BFO:0000050
is_transitive: true
note that the part-of between 4 and 1 is not asserted, it is entailed
gap-filled:
format-version: 1.2
subsetdef: foo "foo"
ontology: test-subset
[Term]
id: ONT:1
subset: foo
[Term]
id: ONT:2
relationship: part_of ONT:1
[Term]
id: ONT:3
relationship: part_of ONT:2
[Term]
id: ONT:4
relationship: part_of ONT:3
[Typedef]
id: overlaps
xref: RO:0002131
[Typedef]
id: part_of
xref: BFO:0000050
is_transitive: true
Proposal:
either a
- new sibling
--methodoption called something likeintermediate-filled-subset - new option on
extractthat is only applicable forsubsetcalled something like--include-intermediate-nodes
(we should name this carefully, the owltools terminology of gap filling/spanning is not great)
unlike the default subset approach which connects subset terms via entailed edges, this would traverse all intermediate nodes via direct edges.
Algorithm:
SubsetWithIntermediates(O,S,P):
S' = S
for each t in O-S:
if there exists t1, t2 in S such that <t1 P t>, <t P t2>:
add t to S'
O' = {}
for e in RGdirect(O):
if e.s in S' and s.o in S' then add e to O`
Here <s P o> means that there exists a relation graph direct or indirect edge <s p o> such that p is in P or p is rdfs:subClassOf
This would be useful (as would a general specification of an algo for graph traversal using the UberGraph redundant graph).
Of your two options, this is clearer to me:
- new option on extract that is only applicable for subset called something like --include-intermediate-nodes
I feel very uncomfortable committing to implementing this.. @dosumis could you we put this on @hkir-dev plate perhaps? I have just assigned 12 robot issues to myself, wanting to do a push.