rdflib icon indicating copy to clipboard operation
rdflib copied to clipboard

`transitive_subjects` and `transitive_objects` return the starting node as first element

Open RinkeHoekstra opened this issue 1 year ago • 1 comments

Both transitive_subjects and transitive_objects return the starting node as first element. This means that they do not behave intuitively, nor is the behavior according to what's described in the docstring.

A bit more detail from the example by @jjon in #1303:

>>> pprint(list(cg.transitive_subjects(RDF.type, pome.Person)))
[rdflib.term.URIRef('http://prosopOnto.medieval.england/2006/04/pome#Person'),
 rdflib.term.URIRef('http://example.com/thisgraph#Hugh_Despenser'),
 rdflib.term.URIRef('http://example.com/thisgraph#Audley_Henry_de'),
 rdflib.term.URIRef('http://example.com/thisgraph#Thomas_earl_of_Warwick_d_1242'),
.
.
. etc.
]

The transitive_subjects method yields pome:Person even though that's not a subject of a triple with rdf:type as predicate and pome:Person as object.

In issue #1303 @white-gecko suggests that you can "just skip the first element when working with the list" but this essentially means that any implementation that uses one of these methods will have to skip the first element.

Suggested fixes:

  • Update the code to make it behave as expected (this may be non-trivial given that the existing behaviour is due to the anti-recursive check)
  • If that fails, update the docstring to reflect actual behaviour.

https://github.com/RDFLib/rdflib/blob/e09ce43f2844d0b0f96ec5b976015901f9268873/rdflib/graph.py#L1141-L1181

RinkeHoekstra avatar Sep 21 '23 07:09 RinkeHoekstra