jena icon indicating copy to clipboard operation
jena copied to clipboard

GH-3507: RDFS testing/wrapping framework.

Open Aklakan opened this issue 2 months ago • 1 comments

GitHub issue resolved #3507

Pull request Description: This is mainly infrastructure work to further assess RDFS reasoning - and any future changes to it.

  • Fixed literals-in-subject inferences due to range declarations (easy fix). Added MapperX.isLiteral to allow for testing on X possibly bypassing Node-materialization.

  • Fixed bug in IteratorConcat which would raise IndexOutOfBoundsException if close() was called without hasNext.

  • Added testing framework that compares all combinations of invoking find(). (I hope I didn't overlook an existing system for that). Added commons-math4 as a test-scoped dependency for the Combinations class.

  • There is a disabled test in AbstractTestRDFS_Extra which fails. It uses :directType rdfs:subPropertyOf rdf:type. How to solve this is a separate issue - I don't think it's straight forward, so the contribution here is the test case that reveals it.

  • Added infrastructure to ease wrapping Match implementations. Factored out DatasetGraphWithGraphTransform base class from DatasetGraphRDFS that can transform any Graph with a Match. There is a TDB2 test case that performs a simple RDFS inference on the NodeId level. The schema must be loaded into the graph though for the NodeIds to be present.

  • ~~Added initial benchmark class for the RDFS reasoner.~~ [update] Removed the benchmark due to lack of scope. Can be added with a later PR to benchmark impact of specific changes.

  • Added Iter.distinctCached in preparation to filter out some duplicates of MatchRDFS. However, this PR does not change any existing behavior. The goal is to extend AssemblerRDFS with an option for distinct/reduced behavior.


  • [x] Tests are included.
  • ~[ ] Documentation change and updates are provided for the Apache Jena website~
  • [x] Commits have been squashed to remove intermediate development commit messages.
  • [x] Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

Aklakan avatar Oct 10 '25 20:10 Aklakan

I added a test case that uses the infrastructure to perform a simple RDFS inference on the NodeId level. It's limited because it requires the ontology and the built-in properties to be present in the graph. Rdfs and owl terms would have to be pre-populated in the NodeTable. At least things can now be wired up in the way that seemed to be the intended once by having the generic X, and one can play around with it.

The relevant snippet is this:

// Add wrapping on NodeId level.
Dataset baseDsg = TDB2Factory.createDataset().asDatasetGraph();
MapperX<NodeId, Tuple3<NodeId>> mapper = MapperXTDB.create(baseDsg);
ConfigRDFS<NodeId> configRDFS = RDFSFactory.setupRDFS(schema, mapper);

DatasetGraph rdfsDsg = new DatasetGraphWithGraphTransform(baseDsg,
    g -> GraphMatch.adapt(g, new MatchRDFSWrapper<>(configRDFS, MatchTDB.wrap(g))));

Note, that this only demonstrates a working wiring with the NodeId realm - it does not leverage QueryEngineTDB to run filters and aggregates on the NodeID level - that tighter integration would be future work.

Aklakan avatar Oct 11 '25 13:10 Aklakan

Please could you bring this up-to-date with 6.0.0? Thanks.

afs avatar Dec 15 '25 13:12 afs

Fixed literals-in-subject inferences due to range declarations (easy fix)

To help me, where exactly is this?

afs avatar Dec 15 '25 13:12 afs

It uses :directType rdfs:subPropertyOf rdf:type.

In the schema presumably.

Subproperties of the RDFS vocabulary are not intended to be supported by this data-centric RDFS system (rdfs:subClassOf, rdfs:domain, rdfs:range).

"messing around with the scaffolding"

It ought to raise an error in all four cases.

afs avatar Dec 17 '25 19:12 afs