Support for RDF Collections (lists)
In doing the RDFS for ShEx, it was necessary to separate the properties expression and expressions, as expressions is expected to take a list of TripleExpression, while expression takes a single value. Furthermore, expressions must have at least two elements. How would we create a shape to validate this?
Certainly, one way is to use the rdf:first and rdf:rest properties:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX shex: <https://shexspec.github.io/ns/>
PREFIX ex: <http://schema.example/>
shex:EachOf CLOSED {
rdf:type shex:EachOf;
shex:expressions BNODE @ex:ListOfTwoExpressions;
}
ex:ListOfTwoExpressions CLOSED {
rdf:type rdf:List?;
rdf:first @shex:TripleExpression;
rdf:rest {
rdf:first @shex:TripleExpression;
rdf:rest @ex:ListOfExpressions;
}
}
ex:ListOfExpressions CLOSED {
rdf:type rdf:List?;
rdf:first @shex:TripleExpression;
rdf:rest [rdf:nil] OR @ex:ListOfExpressions;
}
shex:TripleExpression {
rdf:type shex:OneOf OR shex:EachOf OR shex:Inclusion OR shex:TripleConstraint;
}
But, for what seems like such a common pattern, this is pretty heavy weight. One thought I had was to add a new node kind LIST, which would serve dual purposes of verifying that the property value was a valid RDF Collection, and would also alter value and cardinality checking. The shape might look something like the following:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX shex: <https://shexspec.github.io/ns/>
PREFIX ex: <http://schema.example/>
shex:EachOf CLOSED {
rdf:type shex:EachOf;
shex:expressions LIST @shex:TripleExpression{2,}
}
The downside of this is that we are overloading nodeKind to allow for modifying other behavior in the TripleConstraint. Also, we can't talk about both the cardinality of a list, and the cardinality of the number of lists that should be property values in the TripleConstraint (which I think is really of only theoretical utility, but ...).
Eric came up with something that extends the grammar to look something like the following:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX shex: <https://shexspec.github.io/ns/>
PREFIX ex: <http://schema.example/>
shex:EachOf CLOSED {
rdf:type shex:EachOf;
shex:expressions LIST(@shex:TripleExpression{2,}){1,3}
}
which would not use a node kind, but a new LIST functional syntax, so you could say the shex:expression must have at least one and no more than three distinct lists, each one of which must have at least two entries with the shape shex:TripleExpression.
Personally, I think overloading nodeKind works well, and if someone really has a need to talk both about cardinality of lists as well as list elements, they can fall back to rdf:first/rest primitives.
This feature is very interesting. About the grammar, I understand that
shex:expressions LIST @shex:TripleExpression{2,}
may be ambiguous to parse/understand and a parser would not know if the {2,} refers to the shex:expressions arc or to the size of the list, so probably some parenthesis are needed:
shex:expressions LIST(@shex:TripleExpression{2,}){1,3}
As I understand it, the {2,} declares the expected size of the list (in this case 2 or more elements).
I we allow any cardinality expression, should we also allow +, * and ?, so, for example, ? would mean a list of one element or rdf:nil ?
And in case of {0,0}, does it mean that the list is the value rdf:nil ?
This captures a pattern to avoid DRY, so I 👍 to add this feature to the next release of ShEx.
This feature is very useful, but I would like suggest also a shorthand, which would be like
shex:expression @shex:TripleExpression{1,3}~
that is equal to
shex:expressions LIST(@shex:TripleExpression){1,3}
And for sequences I would like to use
shex:expression @shex:TripleExpression{1,3}=
that is equal to
shex:expressions SEQ(@shex:TripleExpression){1,3}
Will the list members be ordered?
Yes, as dictated by the semantics of RDF Collections.