jena icon indicating copy to clipboard operation
jena copied to clipboard

rdfdiff

Open william-vw opened this issue 1 year ago • 5 comments

Version

4.9.0

What happened?

rdfdiff is not functioning as expected. will issue a PR with suggested fixes & tests.

Relevant output and stacktrace

No response

Are you interested in making a pull request?

Yes

william-vw avatar Aug 15 '23 21:08 william-vw

see PR #1989

william-vw avatar Aug 15 '23 21:08 william-vw

Bug reports should contain actual steps to reproduce the issue - "not functioning as expected" - is not a useful bug report. Please provide steps, including sample data, that show scenario(s) where two input graphs don't yield the expected diff

rvesse avatar Aug 16 '23 08:08 rvesse

Latest changed to 4.9.0 ... so it is always valid in the future :-)

afs avatar Aug 16 '23 10:08 afs

Bug reports should contain actual steps to reproduce the issue - "not functioning as expected" - is not a useful bug report. Please provide steps, including sample data, that show scenario(s) where two input graphs don't yield the expected diff

You are right - apologies. I should've waited with this issue until I had enough time to elaborate.

The issues I found: (lines refer to original code)

  • Any statements with blank nodes are seen as concrete (lines 120, 135). (Node_Blank is a subclass of Node_Concrete, and thus returns true for #isConcrete, which I think is the issue here.)

  • A subgraph from the second set should be added to the candidates list, not the subgraph from first set that is being compared with (line 180).

  • For a statement's first non-encountered bnode, it creates a new subgraph and adds the statement and its forward closure to it (lines 228-235).

However, this does not support cases where multiple nodes end at the same bnode. If you run on testing/cmd/rdfdiff_1.ttl and testing/cmd/rdfdiff_2.ttl (after fixing the above), you see that the "England" triple is ignored as it was not part of the closure of the bnode shared with the "America" triple.

There's a few solutions for this - my original PR implemented one potential solution, but now I'm unsure whether it's the way to go - perhaps adding a different kind of closure would be better.

william-vw avatar Aug 16 '23 12:08 william-vw

Blank nodes are concrete. "concrete" covers the RDF terms that can go in an RDF graph and be printed.

The test "triple.concrete" isn't appropriate (from reading the code).

afs avatar Aug 17 '23 11:08 afs