GH-3507: RDFS testing/wrapping framework.
GitHub issue resolved #3507
Pull request Description: This is mainly infrastructure work to further assess RDFS reasoning - and any future changes to it.
-
Fixed literals-in-subject inferences due to range declarations (easy fix). Added
MapperX.isLiteralto allow for testing onXpossibly bypassing Node-materialization. -
Fixed bug in
IteratorConcatwhich would raiseIndexOutOfBoundsExceptionifclose()was called withouthasNext. -
Added testing framework that compares all combinations of invoking find(). (I hope I didn't overlook an existing system for that). Added
commons-math4as a test-scoped dependency for theCombinationsclass. -
There is a disabled test in
AbstractTestRDFS_Extrawhich fails. It uses:directType rdfs:subPropertyOf rdf:type. How to solve this is a separate issue - I don't think it's straight forward, so the contribution here is the test case that reveals it. -
Added infrastructure to ease wrapping
Matchimplementations. Factored outDatasetGraphWithGraphTransformbase class fromDatasetGraphRDFSthat can transform any Graph with a Match. There is a TDB2 test case that performs a simple RDFS inference on the NodeId level. The schema must be loaded into the graph though for the NodeIds to be present. -
~~Added initial benchmark class for the RDFS reasoner.~~ [update] Removed the benchmark due to lack of scope. Can be added with a later PR to benchmark impact of specific changes.
-
Added
Iter.distinctCachedin preparation to filter out some duplicates of MatchRDFS. However, this PR does not change any existing behavior. The goal is to extend AssemblerRDFS with an option for distinct/reduced behavior.
- [x] Tests are included.
- ~[ ] Documentation change and updates are provided for the Apache Jena website~
- [x] Commits have been squashed to remove intermediate development commit messages.
- [x] Key commit messages start with the issue number (GH-xxxx)
By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.
See the Apache Jena "Contributing" guide.
I added a test case that uses the infrastructure to perform a simple RDFS inference on the NodeId level. It's limited because it requires the ontology and the built-in properties to be present in the graph. Rdfs and owl terms would have to be pre-populated in the NodeTable. At least things can now be wired up in the way that seemed to be the intended once by having the generic X, and one can play around with it.
The relevant snippet is this:
// Add wrapping on NodeId level.
Dataset baseDsg = TDB2Factory.createDataset().asDatasetGraph();
MapperX<NodeId, Tuple3<NodeId>> mapper = MapperXTDB.create(baseDsg);
ConfigRDFS<NodeId> configRDFS = RDFSFactory.setupRDFS(schema, mapper);
DatasetGraph rdfsDsg = new DatasetGraphWithGraphTransform(baseDsg,
g -> GraphMatch.adapt(g, new MatchRDFSWrapper<>(configRDFS, MatchTDB.wrap(g))));
Note, that this only demonstrates a working wiring with the NodeId realm - it does not leverage QueryEngineTDB to run filters and aggregates on the NodeID level - that tighter integration would be future work.
Please could you bring this up-to-date with 6.0.0? Thanks.
Fixed literals-in-subject inferences due to range declarations (easy fix)
To help me, where exactly is this?
It uses :directType rdfs:subPropertyOf rdf:type.
In the schema presumably.
Subproperties of the RDFS vocabulary are not intended to be supported by this data-centric RDFS system (rdfs:subClassOf, rdfs:domain, rdfs:range).
"messing around with the scaffolding"
It ought to raise an error in all four cases.