pySHACL icon indicating copy to clipboard operation
pySHACL copied to clipboard

datatypes validation & CLI error messages

Open rapw3k opened this issue 4 years ago • 6 comments

Hi @ashleysommer Thanks a lot for the hints. Following issue https://github.com/RDFLib/pySHACL/issues/70, using the following inputs:

The shape is here: https://raw.githubusercontent.com/rapw3k/DEMETER/master/models/SHACL/demeterAgriProfile-SHACL.ttl The example data graph is here: https://box.psnc.pl/f/c95eb51962/?raw=1

Issue 1 I checked those erros in the shapefile of property shapes without sh:path. These were generated automatically for owl:disjointwith statements, which translate it into the statements below (https://astrea.linkeddata.es/documentation.html).

?shapeUrl a sh:NodeShape ;
   sh:not ?propertyShape . 
?propertyShape a sh:PropertyShape ;
   sh:class ?disjointType .

I understand this is not correct, so I can just remove the statement of type and it works.

?shapeUrl a sh:NodeShape ;
   sh:not ?propertyShape . 
?propertyShape 
   sh:class ?disjointType .

An issue, however, with the command line is that I didnt get information of the error, so I didnt know what was happening. The error i got is runtime error (MacOS) as below:

RAPMAC-3:SHACL rap$ pyshacl -s /Users/rap/GitRepositories/GitHub/DEMETER/models/SHACL/demeterAgriProfile-SHACL.ttl -e /Users/rap/GitRepositories/GitHub/DEMETER/models/cross-domain.ttl -i rdfs -a -j -f human /Users/rap/Downloads/pilot5.2-afc-observation-point-simplified.ttl
  File "/usr/local/lib/python3.9/site-packages/pyshacl/cli.py", line 169, in main
    is_conform, v_graph, v_text = validate(args.data, **validator_kwargs)
  File "/usr/local/lib/python3.9/site-packages/pyshacl/validate.py", line 390, in validate
    conforms, report_graph, report_text = validator.run()
  File "/usr/local/lib/python3.9/site-packages/pyshacl/validate.py", line 223, in run
    shapes = self.shacl_graph.shapes  # This property getter triggers shapes harvest.
  File "/usr/local/lib/python3.9/site-packages/pyshacl/shapes_graph.py", line 164, in shapes
    self._build_node_shape_cache()
  File "/usr/local/lib/python3.9/site-packages/pyshacl/shapes_graph.py", line 208, in _build_node_shape_cache
    raise ShapeLoadError(


Validator encountered a Runtime Error. Please report this to the PySHACL issue tracker.

issue 2 I fixed the shape, accessible in the same location above, and tried adding the target ontology into the mix with the -e option. The target ontology is available here: https://raw.githubusercontent.com/rapw3k/DEMETER/master/models/cross-domain.ttl Now, I get a validation error:

RAPMAC-3:SHACL rap$ pyshacl -s /Users/rap/GitRepositories/GitHub/DEMETER/models/SHACL/demeterAgriProfile-SHACL.ttl -e /Users/rap/GitRepositories/GitHub/DEMETER/models/cross-domain.ttl -i rdfs -a -j -f human /Users/rap/Downloads/pilot5.2-afc-observation-point-simplified.ttl
Validation Report
Conforms: False
Results (1):
Constraint Violation in DatatypeConstraintComponent (http://www.w3.org/ns/shacl#DatatypeConstraintComponent):
	Severity: sh:Violation
	Source Shape: <https://astrea.linkeddata.es/shapes#0f6502053608fc059a721676b6e67115>
	Focus Node: <http://www.w3id.org/afarcloud/pCoord?lat=45.75&amp;long=4.85>
	Value Node: Literal("POINT(45.75 4.85)" = None, datatype=ns0:wktLiteral)
	Result Path: geo:hasSerialization
	Message: Value is not Literal with datatype rdfs:Literal

This is because the data graph says that:

@prefix ns0: <http://www.opengis.net/ont/geosparql#> .
ns0:asWKT "POINT(45.75 4.85)"^^ns0:wktLiteral

However, the ontology says that asWKT has range wktLiteral and it is a subproperty of hasSerialization, which has range rdfs:Literal (see extracts below). Additionally, the ontology defines wktLiteral as a Datatype (and according to spec Each instance of rdfs:Datatype is a subclass of rdfs:Literal). So, I dont really understand why the validation error?

@prefix geo: <http://www.opengis.net/ont/geosparql#> .
geo:asWKT rdf:type owl:DatatypeProperty ;
          rdfs:subPropertyOf geo:hasSerialization ;
          rdfs:domain geo:Geometry ;
          rdfs:range geo:wktLiteral ;
...
geo:hasSerialization rdf:type owl:DatatypeProperty ;
                     rdfs:domain geo:Geometry ;
                     rdfs:range rdfs:Literal ;
...
geo:wktLiteral rdf:type rdfs:Datatype ;
               rdfs:comment "A Well-known Text serialization of a geometry object."@en ;

rapw3k avatar Feb 22 '21 13:02 rapw3k

Hi @rapw3k Yes, I know error reporting and error diagnostics and debugging is very difficult in PySHACL, especially when using the CLI tool. That is something we are aiming to fix in future versions.

For issue2: This is a complex problem, but I think I know what is causing it. The GeoSPARQL ontology (nor your ontology) does not define geo:wktLiteral to be a subclass of rdfs:Literal. I.e. there is no geo:wktLiteral rdfs:subClassOf rdfs:Literal triple in your datagraph when validating, even after RDFS expansion.

I know it looks like defining geo:hasSerialization rdfs:range rdfs:Literal should imply that any value of asWKT will receive rdfs:Literal as its datatype. But actually that will try to give the value the class of rdfs:Literal, in this case a Datatype and a Class are two different things, and you cannot add a class to a Literal. Eg, the RDFS inferencer cannot define: "POINT(45.75 4.85)"^^geo:wktLiteral rdf:type rdfs:Literal, because a literal cannot be in the subject position of a triple. And even if it could, this would not cause the sh:datatype constraint to pass because adding a class here does not change the datatype of the literal.

So after expanding the graph, the literal "POINT(45.75 4.85)"^^geo:wktLiteral will have the class rdfs:Literal, but it will have the datatype geo:wktLiteral, and geo:wktLiteral is not defined as a subclass of rdfs:Literal (and for reference, rdfs:Literal is itself not a datatype either).

I haven't tested it, but I think you can fix this problem by removing sh:datatype rdfs:Literal from shape <https://astrea.linkeddata.es/shapes#0f6502053608fc059a721676b6e67115>, because it already has sh:nodeKind sh:Literal which I think is what you want in this case.

ashleysommer avatar Feb 23 '21 00:02 ashleysommer

Looking again at your explanation of issue1:

?shapeUrl a sh:NodeShape ;
   sh:not ?propertyShape . 
?propertyShape a sh:PropertyShape ;
   sh:class ?disjointType .

Looks like this is a bug in Astrea's shape generation, I believe the correct output should use a NodeShape, like this:

?shapeUrl a sh:NodeShape ;
   sh:not ?shape2 . 
?shape2 a sh:NodeShape ;
   sh:class ?disjointType .

Is there a way we can submit a bug report to their software?

ashleysommer avatar Feb 23 '21 01:02 ashleysommer

Thanks a lot @ashleysommer for all the information. Regarding Astrea, I am in touch with the developers, I will point this issue to them. Regarding the GeoSPARQL terms (we are just re-using them in our ontology - statement import :)) indeed removing sh:datatype rdfs:Literal from https://astrea.linkeddata.es/shapes#0f6502053608fc059a721676b6e67115 fixed the issue, thanks! I really like this validator.

But getting a bit into the reason behind. As you said the GeoSPARQL ontology does not define geo:wktLiteral to be a subclass of rdfs:Literal. I.e. there is no geo:wktLiteral rdfs:subClassOf rdfs:Literal triple .... What I was pointing out is that, as according to spec https://www.w3.org/TR/rdf-schema/ this would be implicit (see below), right? Nevertheless, i guess I can also point this situation (datatypes) to the SHACL generator (Astrea).

2.4 rdfs:Datatype
rdfs:Datatype is the class of datatypes. All instances of rdfs:Datatype correspond to the RDF model of a datatype described in the RDF Concepts specification [RDF11-CONCEPTS]. rdfs:Datatype is both an instance of and a subclass of rdfs:Class. **Each instance of rdfs:Datatype is a subclass of rdfs:Literal**.

rapw3k avatar Feb 23 '21 11:02 rapw3k

Hi @rapw3k Sorry for taking so long to respond to your previous message.

Each instance of rdfs:Datatype is a subclass of rdfs:Literal

This is interesting, and I actually didn't know that. I wonder what that means for datatype validation in pySHACL.

For example, If I have a Literal: "29e1"^^ex:myDataType and ex:myDataType rdf:type rdfs:Datatype (all datatypes are an instance of rdfs:DataType) and we know that rdfs:Datatype rdfs:subClassOf rdfs:Literal (Each instance of rdfs:Datatype is a subclass of rdfs:Literal) does that mean rdfs:Literal can be used to match ex:MyDataType in a datatype constraint?

ashleysommer avatar Mar 13 '21 22:03 ashleysommer

I think I can do:

  • sh:datatype rdfs:Literal -> Matches any RDF Literal, ie. acts the same as sh:NodeKind sh:Literal
  • sh:datatype rdfs:Datatype -> Matches any RDF Literal if it has a defined explicit datatype, ie "29e1"^^ex:MyDataType matches, but "29" doesn't. This one I'm not quite sure on, maybe all Literals are rdfs:Datatype implicitly even without a datatype specified.

I'll do some testing.

ashleysommer avatar Mar 13 '21 23:03 ashleysommer

Hi @rapw3k Sorry for the delay on this. I've made the changed mentioned above, and I believe this issue reported ("issue 2" above) is resolved. Can you please test on PySHACL v0.17.1, and let me know if its fixes your specific issue?

ashleysommer avatar Oct 11 '21 06:10 ashleysommer