rdf4h
rdf4h copied to clipboard
RDF lists
This PR is a work in progress to add some functions dealing with RDF list: create and query.
@robstewart57 I would like to create a function addRdfList :: (Rdf r) => RDF r -> [Node] -> (RDF r, Node)
but I would need to be able to create blank nodes and ensure they are not already used in the graph.
At the moment, there is no way to ensure for example BNodeGen i
or BNode l
is not already used in the graph without actually querying it. Maybe we should use some strong random UUID?
EDIT: change the signature of addRdfList
; was: addRdfList :: (Rdf r) => RDF r -> [Node] -> Node
For creating blank nodes, if it's going to be a randomly generated UUID then presumably it'd have to live in the IO monad e.g.
createBlankNode :: IO Node
Or else if given a label:
blankNodeFromLabel :: Text -> Node
Or if the UUID should be validated:
blankNodeFromLabel :: Text -> Maybe Node
Or if you want to ensure that a blank node UUID is unique in an RDF graph then maybe
createBlankNode :: (Rdf a) => RDF a -> IO Node
Which would create a random UUID, validate it, and check such a blank node does not exist in the RDF graph already (the graph is the first argument).
That's a pity it requires IO. Could we do the following:
- Move every type definitions from
Data.RDF.Types
toData.RDF.Internal.Types
and export all the types and their constructors. Being "internal", this module should be used with caution. - In
Data.RDF.Types
re-export previous type without their constructors, export smart constructors but none to produceBNodeGen
. - Add a method and an internal counter to the graph to produce securely
BNodeGen
Then the only way to produce BNodeGen
without using internal module would be to use the secure pure graph method. But one issue is if you have several graphs...
Anyway, I think it would be good to have a module marked as "internal" and encourage users to use smart constructors.
What is the motivation behind hiding the generation of blank nodes?
E.g. the Java based Jena framework has the NodeFactory API for creating nodes in applications, including blank nodes:
https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/graph/NodeFactory.html#createBlankNode--
and when the user provides a specific string label:
https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/graph/NodeFactory.html#createBlankNode-java.lang.String-
The issue is to avoid identifier collisions while not changing performance. In my example with the RDF list, I would like a pure function addRdfList :: (Rdf r) => RDF r -> [Node] -> (RDF r, Node)
which take a graph, a list of nodes, then:
- Create a RDF list using brand new blank nodes (i.e. not already in the graph).
- "Add" the generated triples to the graph.
- Return the new graph and the root of the RDF list.
The Turtle parser is a good illustration of the process: the 3rd item of ParseState
contains the value of the next id to be used for BNodeGen
, thus ensuring there will be no collisions.
To achieve this, it would be good not having to check if the nodes are already in the graph to avoid performance drop. A node factory is a good idea: it could be part of the definition of the graph (pure) or independent (possibly not pure, like the random UUID which requires IO
).
I prefer the first option, which is elegant. There would be still a "backdoor" using the original constructors BNodeGen
imported from the internal module. In this case the user keeps full control of the creations of blank nodes.
Other relevant examples requiring avoiding collisions: applications of ontologies such as PROV-O (W3C provenance) or QUDT (units of measurement) which need usually to create lots of intermediary blank nodes.
I think it is best to let each application decide how to implement its node factory. This PR is ready for review.
@wismill thanks for this PR.
Do you have a small example of getRdfList
in action? It might be worthwhile giving such an example as haddock documentation for this function, since
Get an RDF list, given its root.
Might not mean much for those unfamiliar with collections.
Also, what do you mean by "root"?
Secondly, how does:
getRdfList :: (Rdf r) => RDF r -> Node -> [Node]
Relate to the earlier discussion about:
addRdfList :: (Rdf r) => RDF r -> [Node] -> (RDF r, Node)
and about creating unique blank nodes?
I've added documentation.
The idea was to have:
l :: [Node]
(g', n) = addRdfList g l
getRdfList g' n == l
Not going to work further on this