shex icon indicating copy to clipboard operation
shex copied to clipboard

State predicates (and maybe subjects/objects) by their rdfs:label values

Open andrawaag opened this issue 9 years ago • 3 comments

I am currently trying to write shape expressions on Wikidata, specifically for disease, genes and proteins. Wikidata uses property numbers as predicates. (i.e. wdt:P31 = "instance of", wdt:P279 = "subclass of", wdt:P699 = "Disease Ontology ID, etc). It would be convenient if it would be possible to state predicates (or maybe even subjects and objects), but their labels. Would that be possible?

andrawaag avatar Dec 21 '16 21:12 andrawaag

This is similar to wanting to use a path to avoid having a BNode target. In the case of a predicate, you might imagine that this could use a similar path expression to satisfy the value. The key is to not end up pulling in all of SPARQL to do these things.

gkellogg avatar Dec 22 '16 00:12 gkellogg

We could handle this with something like property paths but I think they'd have to mean something different than what we had in mind. Our original property paths discussion is a generalization of nested properties, e.g.:

<PatientShape> {
  <name> {
    <given> LITERAL;
    <family> LITERAL
  }+
}

could be represented (with some loss of cardinality specificity) as

<PatientShape> {
  <name>/<given> LITERAL+;
  <name>/<family> LITERAL+
}

This uses the property path to traverse to a value which is then constrained by a NodeConstraint.

Andra's use case involves identifying the predicate indirectly, as comes up a lot in vocabularies with non human-readable terms.

<PatientShape> {
  <P31> {
    <P32> LITERAL;
    <P33> LITERAL
  }+
}

The current use of property paths wouldn't address this, though some cute trick to connect to an rdfs:label would, e.g.

<PatientShape> {
  [rdfs:label "name"] {
    [rdfs:label "given"] LITERAL;
    [rdfs:label "family"] LITERAL
  }+
}

though I think I'd prefer to borrow from Manchester Syntax/DLQuery's use of quoted labels:

<PatientShape> {
  `name` {
    `given` LITERAL;
    `family` LITERAL
  }+
}

There could be an accompanying directive to say:

BACKTICKS [rdfs:label skos:label]

to say that backticks would be resolved by looking first for the property with an rdfs:label of "name" and then a skos:label. That would require specifying a graph in which to find this stuff which may or may not be the same as the graph being validated and it may or may not be worth standardizing the way we address that label. At increased cost to the user but decreased standardization cost, we could embed that stuff directly into the schema (via an import, if we invent that)

`name` <P31>
`given` <P32>
`family` <P33>
# ... shapes with refs to those constants.

ericprud avatar Dec 22 '16 09:12 ericprud

I mocked this up in a branch called backtick (shexTest tests, shex.js branch). Click protein record in the HTML demo.

This acts as a parsing stage which means the backtick stuff doesn't make it into the JSON representation or the abstract syntax. Should it?

ericprud avatar Dec 24 '16 18:12 ericprud