dead_simple_owl_design_patterns icon indicating copy to clipboard operation
dead_simple_owl_design_patterns copied to clipboard

Explore relationship between templates and RDF Shapes/ShEx

Open cmungall opened this issue 6 years ago • 15 comments

There are similarities and differences in semantics and use cases between templates (dosdps, robot, ottr) and shapes (shex, shacl).

We should explore these and formalize the linkages, and possibly even explore if there is a possible subsuming framework.

Some background: This is being driven in part by the go-shapes schema which is used to validate GO-CAMs but is increasingly becoming a general source of all truth about GO. Originally we had shapes only for obo-core level classes such as BiologicalProcess, CellComponent. But we are seeing the need for deeper subclasses; eg a transport subclass that we can parameterize with start-location and end-location.

This is obviously partly duplicative with the dosdp templates for go. This is not super-satisfying. Aside from duplication of effort, the worst effect is duplication of mindshare and confusion over not having one source of truth.

A current very rough proposal:

  • have a convention for annotating shex with information needed to make it on-par with dosdps. Call this t-shex
    • shex is v nice for annotating any part of a shape with annotations
    • we can imagine annotating the range constraint with a variable name and the shape with a generator string
  • write a t-shex to dosdp (or robot template header) converter
    • note the shex would be abox-based, but it would be trivial to generalize to a defining tbox expression
    • OR adapt dosdp-tools to go from t-shex. This gets around a whole bunch of issues such as optional variables
  • gradually migrate patterns to t-shex

E.g.

<Transport> <BiologicalProcess> AND EXTRA a {
  has-start-location: <CellComponent> // dosdp:var "start"
  has-end-location: <CellComponent> // dosdp:var "end"
} // rdfs:comment "this is for transport"
     dosdp:labelGen "transport [from {{start}}] [to {{end}}]"
`    dosdp:textdefGen "..."

no need for an equiv axiom generator: all the information is in the abox pattern

You could feed this either tuples (with optional fillers) or actual subgraphs, in order to do class generation

I am also assuming in the future many tools for doing things like driving form interfaces from shex/shacl (which are partly interconvertible)

I think there are many advantages to doing this for GO. We are becoming more abox-based. A lot of the standard tooling in ShEx is really nice, and it's a widely adopted standard.

This could just be creating busy work for other uses of dosdps, e.g. they have been phenomenally successful for phenotype reconciliation.

The counterpoint to all of this is skepticism about finding the One True Framework to bind them all (biolinkml?)

See Also

  • discussion on semantic-web mail list about relationship between shapes and generative frameworks: https://lists.w3.org/Archives/Public/semantic-web/2019Nov/0004.html

cc

@vanaukenk @dosumis @matentzn @balhoff @goodb @ukemi @jamesaoverton @beckyjackson

cmungall avatar Nov 12 '19 21:11 cmungall

We will make this the topic of our next ODK call. I must admit that I lack background to really understand what your are proposing here, but I generally want to start using shapes for the phenotype reconciliation effort soon so it makes sense to coordinate with GO and DOSDP.

matentzn avatar Nov 13 '19 07:11 matentzn

Makes sense. This was, of course, one of the motivating use-cases for DOSDPs in the first palce - see instance_graph spec on DOSDP-schema.

dosumis avatar Nov 13 '19 11:11 dosumis

Interesting, I didn't know you were already considering using shapes.

On Tue, Nov 12, 2019, 23:13 Nico Matentzoglu [email protected] wrote:

We will make this the topic of our next ODK call. I must admit that I lack background to really understand what your are proposing here, but I generally want to start using shapes for the phenotype reconciliation effort soon so it makes sense to coordinate with GO and DOSDP.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/INCATools/dead_simple_owl_design_patterns/issues/51?email_source=notifications&email_token=AAAMMOLUBRKD6AKBQTEFKADQTOSKLA5CNFSM4JMJJUHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED5D7TY#issuecomment-553271247, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONVWLGNKAXXMEUBEDDQTOSKLANCNFSM4JMJJUHA .

cmungall avatar Nov 13 '19 13:11 cmungall

May only be of historical interest, but spec here:

https://github.com/INCATools/dead_simple_owl_design_patterns/blob/master/spec/DOSDP_schema_full.yaml#L411

& here:

https://github.com/INCATools/dead_simple_owl_design_patterns/blob/master/spec/DOSDP_schema_full.yaml#L153

@balhoff - did you ever get around to wirting code for this. Think we discussed it at the time.

dosumis avatar Nov 13 '19 13:11 dosumis

This is quite interesting. I'm a little lost on details. I think you are proposing t-shex to be be the ground truth ... right? That is, dosdp would be transformed to t-shex. Or is it the other way round: t-shex would be transformed to dosdp?

wdduncan avatar Nov 13 '19 16:11 wdduncan

I think you are proposing t-shex to be be the ground truth ... right?

Correct

t-shex would be transformed to dosdp?

Correct

(of course there may be a bootstrapping and synchronization step where we iterate with the reverse)

And to be clear "t-shex" is nothing more than standard shex with some conventions as to how it is annotated (hmm, can we model that in shex itself, that's the kind of meta question @hsolbrig loves)

cmungall avatar Nov 13 '19 18:11 cmungall

Ok. So you are proposing to use t-shex to generate data by translating the t-shex into dosdp, and then the dosdp to OWL/RDF?

wdduncan avatar Nov 14 '19 19:11 wdduncan

This is possibly the most expedient path.

But note that t-shex->dosdp is compilation/translation

There isn't really a dosdp->owl translation as such. The dosdp specifies how to translate tuples/rows to OWL.

On Thu, Nov 14, 2019 at 11:32 AM Bill Duncan [email protected] wrote:

Ok. So you are proposing to use t-shex to generate data by translating the t-shex into dosdp, and then the dosdp to OWL/RDF?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/INCATools/dead_simple_owl_design_patterns/issues/51?email_source=notifications&email_token=AAAMMOPUAR2QT5TEGH5IZ6TQTWRTNA5CNFSM4JMJJUHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEDAQ6I#issuecomment-554043513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOPZ2EZ73BTTABPXXSLQTWRTNANCNFSM4JMJJUHA .

cmungall avatar Nov 14 '19 21:11 cmungall

I think this appraoch is fine if you're willing to limit design pattern expressivity: patterns entirely EquivalentClass with no nested class expressions. The one case where I think this would be a loss for GO is GCIs used to align branches. e.g. I still think patterns with GCIs are the best way to align CC organization/assembly/dissasembly in BP with the CC heirarchy. IIRC, I even wrote patterns for this.

dosumis avatar Nov 15 '19 09:11 dosumis

Think this approach has the advantage that it should be reasonably transparent to those used to building GO-CAM models in a way that perhaps DOSDPs have failed to be. OTOH - isn't there a danger that it will result in unsafe patterns - that apply to some broad subset of cases but cause misclassification outside of these? To prevent this I think you'd still need a strong editorial step between deriving DOSDPs derived from ShEx patterns and implementing them in the ontology.

dosumis avatar Nov 15 '19 09:11 dosumis

Do you still have those GCI examples? I don't see in the current ones: https://github.com/geneontology/go-ontology/blob/master/src/design_patterns/cc_disassembly.yaml

My so far vague thoughts are that we can always bring across any aspect of dosdps into t-shex annotations, and just treat as an alternate syntax for dosdps.

But this isn't ideal if we want to embrace the abox shape as being the 'source of truth', we end up mixing the two in a slightly redundant way

I think the GCIs might be expressible in a more abox-centric way that can then be autogeneralized to tboxes, but this remains to be determined.

isn't there a danger that it will result in unsafe patterns

would this be in the tbox generalization step? Quite possibly, need to think of some examples..

cmungall avatar Nov 15 '19 18:11 cmungall

Do you still have those GCI examples?

See https://github.com/geneontology/go-ontology/blob/master/src/design_patterns/cc_organization.yaml#L49

dosumis avatar Dec 04 '19 17:12 dosumis

Another possibility here is to build this in to biolinkml yaml, cc @hsolbrig

https://github.com/biolink/biolinkml -- note completely independent of biolink itself

A related ticket: https://github.com/biolink/biolinkml/issues/128

classes:
  transport:
    is_a: biological process
 slots:
   - start location
   - end location
 templates:
   name:
    as string value: "transport from {start location} to {end location}"
   definition:
    as string value: "...."
...

with equivalence/logdef pattern inferred automatically

for GCIs, how about just specifying these directly as abox rules and inferring a SPARQL update?

e.g.

?cp results-in-org-of ?c1, ?c1 part-of ?c
->
exists: ?p
?cp part-of ?p
?p a :organization, ?p has-input ?c

there is a deterministic translation of this structure to an ugly sparql tbox update command

cmungall avatar Jun 26 '20 17:06 cmungall

Thinking more about using the abox representation as primary (and using something like uml or biolinkml or shex) with derivations of tbox equiv axioms, @matentzn posed the question of what to do about complex patterns where the desired tbox expression employs nesting

I would do this through simple composition of standard class definitions

e.g for subq case, we may have

classes:
  phenotype:
    slot_usage:
      has part:
        range: atomic phenotype
     to_str: "{atomic phenotype}"
   atomic phenotype:
     slots: [inheres in, type, qualifier]
  morphology phenotype:
     is_a: atomic phenotype
     slot_usage:
       type:
         range: morphology class
       inheres in:
         range: anatomical structure
       to_str: "{inheres in} morphology"
  abnormal morphology phenotype:
     is_a: morphology phenotype
     slot_usage:
       qualifier:
         range: abnormal class
       to_str: "abnormal {inheres in} morphology"
etc

this constrains the shape of aboxes and gives string gen/parse. E.g. "morphology of patient123s left femur".

the shape of tboxes follows directly from this, together with patterns for equivalence axioms, no need for writing owl in macros.

cmungall avatar Jul 09 '20 21:07 cmungall

Here is an example of using biolinkml as a template language for a chemical ontology: https://github.com/cmungall/chemistry-ontology

cmungall avatar Jul 15 '20 01:07 cmungall