robot robot extract/merge should optionally inject provenance

Resolves [#893 , resolves #893]

[x] docs/ have been added/updated
[x] tests have been added/updated
[x] mvn verify says all tests pass
[x] mvn site says all JavaDocs correct
[x] CHANGELOG.md has been updated

Mar 14 '22 14:03 hkir-dev

I'm struggling to find the time, but I will look at this.

Mar 21 '22 16:03 jamesaoverton

@hkir-dev Thanks for the PR. I finally got a chance to take a first look. Overall it looks good. I have one request at this point: can you please make the example files smaller? It's great that you included the examples and tests, but they are just larger than they need to be to demonstrate the purpose and effect of the command.

When you make this change, I'll take another look, and then ask @beckyjackson to review.

Sorry again for the delay -- I'll be faster next time.

Apr 05 '22 14:04 jamesaoverton

@jamesaoverton I simplified the example files. I can happily implement further improvements if required.

Apr 06 '22 10:04 hkir-dev

@hkir-dev Much appreciated! @beckyjackson can you please review this?

Apr 06 '22 13:04 jamesaoverton

@beckyjackson Updated CHANGELOG to point PR instead of the issue to be compatible with old ones.

Apr 07 '22 07:04 hkir-dev

It would be nice if the annotation property used for axioms was configurable, but then again that option that could be added in a backwards-compatible way later.

Jun 10 '22 14:06 balhoff

What is the intended behaviour for entities and axioms that are not in the ontology namespace, such as RDF, RDFS, OIO, PROV, etc.? The current code marks all terms as coming from the source file, but that is false in these cases.

@jamesaoverton I don't think your concern applies to axioms. I want to have axioms annotated based on which file they are coming from, regardless of any namespaces involved.

Jun 10 '22 14:06 balhoff

@balhoff You're right. I updated my comment. The main purpose is to run this on a base file, which should not include external/upstream axioms.

But in even in a base file, OWLAPI will include "stubs" for upstream entities from RDF, RDFS, PROV, etc. and their rdf:types, because it OWL needs various type information. Just saying every entity in the source file is defined by that source file is misleading. Nico's proposal 3 would filter for namespaces, like we do when constructing the base file, so I think the result will be correct.

Jun 10 '22 14:06 jamesaoverton

I'm pretty sure we're in agreement as far as entity annotations. But just to be clear, if one of the input files ('myont-base.owl') says rdfs:comment rdf:type owl:AnnotationProperty, I want that axiom annotated with prov:wasDerivedFrom <myont-base.owl> in the output, so I know which of the input files made that declaration.

Jun 10 '22 14:06 balhoff

The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.

I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split. So:

Example:1

A sub B [source: Nico]
A sub B

becomes

A sub B [source: Nico]

and

Example:2

A sub B [source: Nico]
A sub B [source: James]

becomes:

A sub B [source: Nico, source: James]

Jun 10 '22 14:06 matentzn

I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split.

That would be fine, but it doesn't affect my opinion about how this command should work. For this case, if an axiom came from multiple ontologies, it would be clearer to see it multiple times, especially if the versions from the different ontologies had their own annotations coming from those ontologies.

Jun 10 '22 15:06 balhoff

Ah didnt think of that case where the same axioms comes from two sources.. Maybe. Ok. Yeah, in any case I agree that this is needed.

Jun 10 '22 15:06 matentzn

I finally got around to trying this—generally looks great!. My main interest is in annotating axioms with wasDerivedFrom in merge. I ran this test:
./bin/robot merge -I 'http://purl.obolibrary.org/obo/uberon/uberon-base.owl' -I 'http://purl.obolibrary.org/obo/cl/cl-base.owl' -I 'http://purl.obolibrary.org/obo/pato/pato-base.owl' -I 'http://purl.obolibrary.org/obo/ro/ro-base.owl' -I 'http://purl.obolibrary.org/obo/go/go-base.owl' --annotate-derived-from true extract --method BOT --term 'http://purl.obolibrary.org/obo/BFO_0000051' --term 'http://purl.obolibrary.org/obo/PATO_0000912' --term 'http://purl.obolibrary.org/obo/RO_0002314' --term 'http://purl.obolibrary.org/obo/GO_0016049' --term 'http://purl.obolibrary.org/obo/BFO_0000066' --term 'http://purl.obolibrary.org/obo/CL_0000127' --term 'http://purl.obolibrary.org/obo/RO_0002573' --term 'http://purl.obolibrary.org/obo/PATO_0000460' -o module.ofn
It's awesome to see the origin of all the axioms in the extracted module. But there is one thing I think should be fixed before merging. The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.

@balhoff thank you for noticing this. I fixed the issue and added three unit tests to MergeOperationTest.java.

Jun 29 '22 13:06 hkir-dev

Hey everybody. I lost track of this over the summer, between conferences and vacations. I apologize.

I just want to check one more time that this is really, really the behaviour you want, especially for the core vocabularies such as RDF, RDFS, PROV, etc. For example: https://github.com/ontodev/robot/blob/e7a10975c51ede4960d47025ff2a966d257770a5/robot-core/src/test/resources/simple_defined_by.owl#L26-L28

It says rdfs:label rdfs:isDefinedBy simple.owl. I can see how that would be useful for debugging certain cases, but that's not at all what I would expect from an "isDefinedBy" predicate. On the other hand, the RDFS spec is no help. It says rdfs:isDefinedBy rdfs:comment "The defininition of the subject resource." What the heck does that mean? I guess it makes more sense as a subproperty of rdfs:seeAlso which has rdfs:comment "Further information about the subject resource.".

Sep 30 '22 15:09 jamesaoverton

Personally I think provenance for this feature is about the axioms, not the entities. I would rather remove the entity annotation feature and keep the axiom annotation feature if that meant this PR could be merged and released sooner.

Oct 13 '22 17:10 balhoff

It depends on the perspective - it is true that for the provenance feature, axioms are the critical part, but our other goal is to try and establish some level of "ownership" for an existing term, a place it belongs.. I think we can go ahead with this feature here as it is. We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again,

Oct 24 '22 08:10 matentzn

We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again,

That's a good idea!

Oct 24 '22 14:10 balhoff

Just adding isDefinedBy annotations to OMO is an OBO-centric and pretty narrow "solution" to this. Since this is just an option, maybe that's OK. This is one of those decisions that I'm worried I'll regret. Let me think about it a bit more.

Oct 24 '22 15:10 jamesaoverton

We could introduce a flag here and leave it to the user. --annotate-built-in-namespaces which we set to true by default.

EDIT: or if you don't like "built-in" you could provide a list: --ignore-base-iri http://skos.org/

Oct 25 '22 10:10 matentzn

When it comes down to it, owl and rdf vocabulary are really the questionable ones to be annotated right? Or is there anything else?

Oct 28 '22 17:10 matentzn

The easiest problems to point to for this isDefinedBy behaviour are built-ins (RDF, RDFS, OWL), but it could easily be oboInOwl, FOAF, or anything. I have no interest in updating and maintaining an include-list of these. I'm clear on the problems this is creating, and much less clear on the problems that it's supposed to be solving.

I expect to be answering issues from confused users for years to come, but I'll merge anyway.

Oct 28 '22 19:10 jamesaoverton

robot robot copied to clipboard

robot extract/merge should optionally inject provenance

robot
robot copied to clipboard