rdf4h icon indicating copy to clipboard operation
rdf4h copied to clipboard

RDF lists

Open wismill opened this issue 6 years ago • 9 comments

This PR is a work in progress to add some functions dealing with RDF list: create and query.

wismill avatar Jul 23 '18 14:07 wismill

@robstewart57 I would like to create a function addRdfList :: (Rdf r) => RDF r -> [Node] -> (RDF r, Node) but I would need to be able to create blank nodes and ensure they are not already used in the graph.

At the moment, there is no way to ensure for example BNodeGen i or BNode l is not already used in the graph without actually querying it. Maybe we should use some strong random UUID?

EDIT: change the signature of addRdfList; was: addRdfList :: (Rdf r) => RDF r -> [Node] -> Node

wismill avatar Jul 23 '18 14:07 wismill

For creating blank nodes, if it's going to be a randomly generated UUID then presumably it'd have to live in the IO monad e.g.

createBlankNode :: IO Node

Or else if given a label:

blankNodeFromLabel :: Text -> Node

Or if the UUID should be validated:

blankNodeFromLabel :: Text -> Maybe Node

robstewart57 avatar Jul 23 '18 15:07 robstewart57

Or if you want to ensure that a blank node UUID is unique in an RDF graph then maybe

createBlankNode :: (Rdf a) => RDF a -> IO Node

Which would create a random UUID, validate it, and check such a blank node does not exist in the RDF graph already (the graph is the first argument).

robstewart57 avatar Jul 23 '18 15:07 robstewart57

That's a pity it requires IO. Could we do the following:

  • Move every type definitions from Data.RDF.Types to Data.RDF.Internal.Types and export all the types and their constructors. Being "internal", this module should be used with caution.
  • In Data.RDF.Types re-export previous type without their constructors, export smart constructors but none to produce BNodeGen.
  • Add a method and an internal counter to the graph to produce securely BNodeGen

Then the only way to produce BNodeGen without using internal module would be to use the secure pure graph method. But one issue is if you have several graphs...

Anyway, I think it would be good to have a module marked as "internal" and encourage users to use smart constructors.

wismill avatar Jul 23 '18 15:07 wismill

What is the motivation behind hiding the generation of blank nodes?

E.g. the Java based Jena framework has the NodeFactory API for creating nodes in applications, including blank nodes:

https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/graph/NodeFactory.html#createBlankNode--

and when the user provides a specific string label:

https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/graph/NodeFactory.html#createBlankNode-java.lang.String-

robstewart57 avatar Jul 25 '18 17:07 robstewart57

The issue is to avoid identifier collisions while not changing performance. In my example with the RDF list, I would like a pure function addRdfList :: (Rdf r) => RDF r -> [Node] -> (RDF r, Node) which take a graph, a list of nodes, then:

  1. Create a RDF list using brand new blank nodes (i.e. not already in the graph).
  2. "Add" the generated triples to the graph.
  3. Return the new graph and the root of the RDF list.

The Turtle parser is a good illustration of the process: the 3rd item of ParseState contains the value of the next id to be used for BNodeGen, thus ensuring there will be no collisions.

To achieve this, it would be good not having to check if the nodes are already in the graph to avoid performance drop. A node factory is a good idea: it could be part of the definition of the graph (pure) or independent (possibly not pure, like the random UUID which requires IO).

I prefer the first option, which is elegant. There would be still a "backdoor" using the original constructors BNodeGen imported from the internal module. In this case the user keeps full control of the creations of blank nodes.

Other relevant examples requiring avoiding collisions: applications of ontologies such as PROV-O (W3C provenance) or QUDT (units of measurement) which need usually to create lots of intermediary blank nodes.

wismill avatar Jul 25 '18 19:07 wismill

I think it is best to let each application decide how to implement its node factory. This PR is ready for review.

wismill avatar May 21 '19 11:05 wismill

@wismill thanks for this PR.

Do you have a small example of getRdfList in action? It might be worthwhile giving such an example as haddock documentation for this function, since

Get an RDF list, given its root.

Might not mean much for those unfamiliar with collections.

Also, what do you mean by "root"?

Secondly, how does:

getRdfList :: (Rdf r) => RDF r -> Node -> [Node]

Relate to the earlier discussion about:

addRdfList :: (Rdf r) => RDF r -> [Node] -> (RDF r, Node)

and about creating unique blank nodes?

robstewart57 avatar May 21 '19 11:05 robstewart57

I've added documentation.

The idea was to have:

l :: [Node]
(g', n) = addRdfList g l
getRdfList g' n == l

wismill avatar May 21 '19 12:05 wismill

Not going to work further on this

wismill avatar Jul 05 '24 11:07 wismill