robot icon indicating copy to clipboard operation
robot copied to clipboard

Question about repairing replacedBy multiple classes

Open psiotwo opened this issue 3 years ago • 11 comments

IAO_0100001 (replaced by) is handled by the repair command.

Consider this input:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix obo: <http://purl.obolibrary.org/obo/> .

:c1 a owl:Class ;
    rdfs:label "l1"@en .

:c2 a owl:Class ;
    rdfs:label "l2"@en .

:c3 a owl:Class ;
    owl:deprecated true ;
    obo:IAO_0100001 :c1 ;
    obo:IAO_0100001 :c2 ;
    rdfs:label "l3"@en .

Robot picks (randomly?) one of the class to be used for replacement, producing after robot repair -i input.ttl --annotation-property rdfs:label -o output.ttl the following output.ttl:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix obo: <http://purl.obolibrary.org/obo/> .

[ a owl:Ontology ] .

obo:IAO_0100001 a owl:AnnotationProperty .

:c1 a owl:Class ;
      rdfs:label "l1"@en .
:c2 a owl:Class ;
      rdfs:label "l2"@en , "l3"@en .

:c3 a owl:Class ;
     owl:deprecated true ;
     obo:IAO_0100001 :c1, :c2.

I am not defending the usefulness of the example itself (owl:equivalentClass statements should be handled transitively, in which case this problem would not be needed to solve as :c1 and :c2 should have been merged into a single concept anyway). Yet, technically, for some technical reasons, I need to keep :c1 and :c2 separate and hang "l3" to both concepts. I would have expected that repair operation itself does not make any assumption on the number of replacing classes ... but it is obviously not the case.

Did you come across this in any of your use-cases?

psiotwo avatar Sep 15 '22 11:09 psiotwo

It seems what you are doing here is a split (replacing 1 class by two class with more specific meanings). We have a very strong assumptions in obo:

  1. oio:replaced_by is used for exact replacements
  2. no two classes in the ontology can be the same

From this follows that replaced_by is functional, i.e. can only exist once.

You can use oio:consider instead here, but repair will probably not do anything automatically.

All this does not answer your questions, but I just don't think repair is the right tool here.

matentzn avatar Sep 15 '22 11:09 matentzn

@matentzn thanks for response. In fact :c1 and :c2 are not meant to be more specific in my case - but for technical reasons I would like to postpone their merging to another pipeline step. So I basically need an intermediate ontology artifact in the pipeline to be violating 2. in your assumption list above (also in the example it has not been decided on the deprecation of either :c1 or :c2 yet).

psiotwo avatar Sep 15 '22 12:09 psiotwo

Got it. In this case I will leave to to James to decide.. your proposal supports a case that is in principle out of scope for repair I think, it also actually creates an invalid ontology (same label violation), which does not seem to me like a desirable repair.. I would suggest you solve your case with sparql update instead?

matentzn avatar Sep 15 '22 13:09 matentzn

Thanks for response. Exactly, I am now solving with a dedicated SPARQL update.

Yet, maybe it is a rather a conceptual question - maybe I am wrong, but the fact that robot repair in its current implementation leaves two identical concepts (or concepts sharing a label) in the output seems to me rather accidental than done on purpose, as its documentation speaks about handling replaceBy links (not uniqueness of labels/concepts/etc.)

In the end even the current version of the repair command implementation does not leave the resulting ontology OBO-compliant (e.g. there are still violations of the reporting rules).

But this is probably more the problem of my usage of ROBOT - I understand & use it more as a bunch of commands inside a larger pipeline containing many other non-robot steps at the end of which I would like the ontology to be OBO compliant.

psiotwo avatar Sep 15 '22 13:09 psiotwo

You are of course right in that. Maybe more precisely, it would guarantee that the output is invalid (if you moved the same label to two classes). But then again, it also add two labels, which is also.. invalid according to OBO :P I cede the point!

matentzn avatar Sep 15 '22 13:09 matentzn

@cmungall wrote repair, so I'd appreciate his thoughts on this.

jamesaoverton avatar Sep 15 '22 14:09 jamesaoverton

I'm actually confused for an orthogonal reason - repair should migrate axioms pointing to an obsolete class, it should not migrate axioms about that class.

But going back to the original question. I think by default repair should fail when replaced-bys do not satisfy the functional property. We can add an override that would do a split, with the understanding that the output would have to be manually fixed.

cmungall avatar Sep 16 '22 22:09 cmungall

I'm actually confused for an orthogonal reason - repair should migrate axioms pointing to an obsolete class, it should not migrate axioms about that class.

Does that mean that even the current behaviour is not intentional (it does migrate :c3 rdfs:label "l3"@en for :c3 deprecated)?

But going back to the original question. I think by default repair should fail when replaced-bys do not satisfy the functional property. We can add an override that would do a split, with the understanding that the output would have to be manually fixed.

Makes sense to me.

psiotwo avatar Sep 19 '22 07:09 psiotwo

Does that mean that even the current behaviour is not intentional (it does migrate :c3 rdfs:label "l3"@en for :c3 deprecated)?

Seems yes, the label should not be migrated (just logical axioms)

matentzn avatar Sep 19 '22 08:09 matentzn

Is this the only action item for this issue?

add an override that would do a split, with the understanding that the output would have to be manually fixed

As for the potential bug here... Using the --annotation-property arg migrates the given property, so I think the behavior itself is actually correct (it migrates the label because that's what was specified), but the outcome is bad.

@psiotwo when you used --annotation-property rdfs:label, what was the expected behavior you wanted? Did you want to replace the label of the replacement classes with the deprecated label?

I think there are a couple options:

  1. Keep behavior as-is and add a note to the docs that using rdfs:label is not recommended
  2. Add a new option to replace label of the replacement class: --update-label true, --replace-label true, something like that
  3. Add an option to migrate annotations under a different property: --copy-annotation <target-property> <source-property>

Option 3 would be good if you want to copy the label of the deprecated class as a synonym to the replacement class.

beckyjackson avatar Oct 01 '22 13:10 beckyjackson

@beckyjackson

when you used --annotation-property rdfs:label, what was the expected behavior you wanted? Did you want to replace the label of the replacement classes with the deprecated label?

I was hoping to see "l3" as a label for both :c1 and :c2.

It seems to me that the command is already doing too many things at present, so for me your option 1 is the most sustainable one without significant effort if the command is not supposed to compromise "single label principle"

In the "ideal world", I would have a set of technical "data transformation" robot commands (that help e.g. to do what SPARQL cannot) and a set of "curation" commands that use the technical ones for user scenarios. Repair seems to me a combination of both.

psiotwo avatar Oct 05 '22 09:10 psiotwo