robot
robot copied to clipboard
robot extract/merge should optionally inject provenance
Resolves [#893 , resolves #893]
- [x]
docs/have been added/updated - [x] tests have been added/updated
- [x]
mvn verifysays all tests pass - [x]
mvn sitesays all JavaDocs correct - [x]
CHANGELOG.mdhas been updated
I'm struggling to find the time, but I will look at this.
@hkir-dev Thanks for the PR. I finally got a chance to take a first look. Overall it looks good. I have one request at this point: can you please make the example files smaller? It's great that you included the examples and tests, but they are just larger than they need to be to demonstrate the purpose and effect of the command.
When you make this change, I'll take another look, and then ask @beckyjackson to review.
Sorry again for the delay -- I'll be faster next time.
@jamesaoverton I simplified the example files. I can happily implement further improvements if required.
@hkir-dev Much appreciated! @beckyjackson can you please review this?
@beckyjackson Updated CHANGELOG to point PR instead of the issue to be compatible with old ones.
It would be nice if the annotation property used for axioms was configurable, but then again that option that could be added in a backwards-compatible way later.
What is the intended behaviour for entities and axioms that are not in the ontology namespace, such as RDF, RDFS, OIO, PROV, etc.? The current code marks all terms as coming from the source file, but that is false in these cases.
@jamesaoverton I don't think your concern applies to axioms. I want to have axioms annotated based on which file they are coming from, regardless of any namespaces involved.
@balhoff You're right. I updated my comment. The main purpose is to run this on a base file, which should not include external/upstream axioms.
But in even in a base file, OWLAPI will include "stubs" for upstream entities from RDF, RDFS, PROV, etc. and their rdf:types, because it OWL needs various type information. Just saying every entity in the source file is defined by that source file is misleading. Nico's proposal 3 would filter for namespaces, like we do when constructing the base file, so I think the result will be correct.
I'm pretty sure we're in agreement as far as entity annotations. But just to be clear, if one of the input files ('myont-base.owl') says rdfs:comment rdf:type owl:AnnotationProperty, I want that axiom annotated with prov:wasDerivedFrom <myont-base.owl> in the output, so I know which of the input files made that declaration.
The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.
I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split. So:
Example:1
A sub B [source: Nico]
A sub B
becomes
A sub B [source: Nico]
and
Example:2
A sub B [source: Nico]
A sub B [source: James]
becomes:
A sub B [source: Nico, source: James]
I think we need a general "normalise" command in ROBOT - this is the only owltools dependency I still use regularly which fits exactly this use case here. The normalise method basically puts an index on all axioms without annotations, and then adds all annotations to those axioms regardless of across how many axioms it was previously split.
That would be fine, but it doesn't affect my opinion about how this command should work. For this case, if an axiom came from multiple ontologies, it would be clearer to see it multiple times, especially if the versions from the different ontologies had their own annotations coming from those ontologies.
Ah didnt think of that case where the same axioms comes from two sources.. Maybe. Ok. Yeah, in any case I agree that this is needed.
I finally got around to trying this—generally looks great!. My main interest is in annotating axioms with
wasDerivedFrominmerge. I ran this test:./bin/robot merge -I 'http://purl.obolibrary.org/obo/uberon/uberon-base.owl' -I 'http://purl.obolibrary.org/obo/cl/cl-base.owl' -I 'http://purl.obolibrary.org/obo/pato/pato-base.owl' -I 'http://purl.obolibrary.org/obo/ro/ro-base.owl' -I 'http://purl.obolibrary.org/obo/go/go-base.owl' --annotate-derived-from true extract --method BOT --term 'http://purl.obolibrary.org/obo/BFO_0000051' --term 'http://purl.obolibrary.org/obo/PATO_0000912' --term 'http://purl.obolibrary.org/obo/RO_0002314' --term 'http://purl.obolibrary.org/obo/GO_0016049' --term 'http://purl.obolibrary.org/obo/BFO_0000066' --term 'http://purl.obolibrary.org/obo/CL_0000127' --term 'http://purl.obolibrary.org/obo/RO_0002573' --term 'http://purl.obolibrary.org/obo/PATO_0000460' -o module.ofnIt's awesome to see the origin of all the axioms in the extracted module. But there is one thing I think should be fixed before merging. The resulting ontology includes not only the annotated axioms but also the unannotated versions. So they are all duplicated. The unannotated axioms should be removed.
@balhoff thank you for noticing this. I fixed the issue and added three unit tests to MergeOperationTest.java.
Hey everybody. I lost track of this over the summer, between conferences and vacations. I apologize.
I just want to check one more time that this is really, really the behaviour you want, especially for the core vocabularies such as RDF, RDFS, PROV, etc. For example: https://github.com/ontodev/robot/blob/e7a10975c51ede4960d47025ff2a966d257770a5/robot-core/src/test/resources/simple_defined_by.owl#L26-L28
It says rdfs:label rdfs:isDefinedBy simple.owl. I can see how that would be useful for debugging certain cases, but that's not at all what I would expect from an "isDefinedBy" predicate. On the other hand, the RDFS spec is no help. It says rdfs:isDefinedBy rdfs:comment "The defininition of the subject resource." What the heck does that mean? I guess it makes more sense as a subproperty of rdfs:seeAlso which has rdfs:comment "Further information about the subject resource.".
Personally I think provenance for this feature is about the axioms, not the entities. I would rather remove the entity annotation feature and keep the axiom annotation feature if that meant this PR could be merged and released sooner.
It depends on the perspective - it is true that for the provenance feature, axioms are the critical part, but our other goal is to try and establish some level of "ownership" for an existing term, a place it belongs.. I think we can go ahead with this feature here as it is. We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again,
We will add the "isDefinedBy" annotations to OMO to make sure they are not overwritten again,
That's a good idea!
Just adding isDefinedBy annotations to OMO is an OBO-centric and pretty narrow "solution" to this. Since this is just an option, maybe that's OK. This is one of those decisions that I'm worried I'll regret. Let me think about it a bit more.
We could introduce a flag here and leave it to the user. --annotate-built-in-namespaces which we set to true by default.
EDIT:
or if you don't like "built-in" you could provide a list: --ignore-base-iri http://skos.org/
When it comes down to it, owl and rdf vocabulary are really the questionable ones to be annotated right? Or is there anything else?
The easiest problems to point to for this isDefinedBy behaviour are built-ins (RDF, RDFS, OWL), but it could easily be oboInOwl, FOAF, or anything. I have no interest in updating and maintaining an include-list of these. I'm clear on the problems this is creating, and much less clear on the problems that it's supposed to be solving.
I expect to be answering issues from confused users for years to come, but I'll merge anyway.