rdflib icon indicating copy to clipboard operation
rdflib copied to clipboard

feat: proposed Dataset API changes

Open recalcitrantsupplant opened this issue 1 year ago • 5 comments

This draft PR is intended promote discussion/reach a consensus on the proposed changes to the Dataset API, by concretely describing what the interfaces would look like. As such, at this point in time, it is not intended that all of the required changes to dependencies/tests are implemented.

The following discussion gives context: https://github.com/RDFLib/rdflib/discussions/2591 Additionally:

  • There is a general writeup in /dataset_api.md
  • Examples in examples/datasets.py

Summary of changes:

  • Created a separate dataset.py module which contains only the Dataset class
  • Removed the inheritance of Dataset from ConjunctiveGraph/Graph
  • Stubbed methods / function signatures with expected input and output types
    • Added methods proposed/discussed here https://github.com/RDFLib/rdflib/discussions/2591#discussion-5619172
    • Added stubs for other expected methods based on existing and inherited methods.
    • Updated methods for triples, quads, subjects, subjects_objects etc. to accept "slicing" where iterables of terms can be passed in to filter on a set of terms.
    • Added graph= parameter and corresponding Enum for GraphType to allow filtering operations over the Default Graph or all Named Graphs
  • Removed ConjunctiveGraph class
  • Started removing the identifier attribute on Graph. There will still be many references to this.

recalcitrantsupplant avatar Jan 29 '25 01:01 recalcitrantsupplant

Re:

  • Removed ConjunctiveGraph class

Does it make sense to apply removal of ConjunctiveGraph in response to #3064 , in advance of accepting this PR? If not globally, then what about in just the JSON-LD serializer?

I'm asking without having reviewed all of the proposed typing here yet.

ajnelson-nist avatar Feb 10 '25 16:02 ajnelson-nist

My general comment is that I don't like the new arguments of triples() and graphs(). IMO the equivalent result should be achieved using default_graph and get_named_graph(). There shouldn't be multiple ways to achieve the same thing.

If filtering needs to be done, it can be achieved with a simple map() and lambda.

namedgraph avatar Feb 11 '25 19:02 namedgraph

One more question arised while I commenting: what is the graph component for the default graph in ds.graph() result? It used to be DATASET_DEFAULT_GRAPH_ID = URIRef("urn:x-rdflib:default"), but now that the Graph.identifier is gone, what is it? It would make sense to use None, but then the signature of Dataset.quad() becomes ambiguous in the case of quads(None, None, None, None) -- is the last None a wildcard or does it signify the default graph?

namedgraph avatar Feb 11 '25 19:02 namedgraph

Re:

  • Removed ConjunctiveGraph class

Does it make sense to apply removal of ConjunctiveGraph in response to #3064 , in advance of accepting this PR? If not globally, then what about in just the JSON-LD serializer?

I'm asking without having reviewed all of the proposed typing here yet.

I wouldn't think so - as ConjunctiveGraph is used in a few different places and there's an inheritance hierarchy that removing it would break, so it's not a simple change. The JSON-LD serializer could potentially switch to Dataset while leaving ConjunctiveGraph in though, I haven't looked at it.

recalcitrantsupplant avatar Feb 16 '25 23:02 recalcitrantsupplant

One more question arised while I commenting: what is the graph component for the default graph in ds.graph() result? It used to be DATASET_DEFAULT_GRAPH_ID = URIRef("urn:x-rdflib:default"), but now that the Graph.identifier is gone, what is it? It would make sense to use None, but then the signature of Dataset.quad() becomes ambiguous in the case of quads(None, None, None, None) -- is the last None a wildcard or does it signify the default graph?

Good question. I would suggest None is an expected result/output from methods and can means that results are from the default graph, but that we are not in fact making them equivalent. When providing input to functions, None != default graph, and a user should not use it to mean this. The following methods can be used to unambiguously refer to the default graph:

ds.triples(graph="default") ds.quads(graph="default") ds.default_graph.triples() etc.

recalcitrantsupplant avatar Feb 17 '25 00:02 recalcitrantsupplant