rdflib Dataset.parse broken

Using Dataset.parse to parse a graph just doesn't work.

from rdflib import Dataset, URIRef
d = Dataset()
g = d.parse(data='<a:a> <b:b> <c:c> .', format='turtle', publicID=URIRef('g:g'))
print("After parse:")
for h in d.graphs(): print(h)
if g.identifier not in d.graphs():
    print("g has not been added to Dataset")

This gives:

After parse:
DEFAULT
g has not been added to Dataset

It turns out that no triple has been added at all.

On the other hand, the same works with ConjunctiveGraph:

from rdflib import ConjunctiveGraph, URIRef
d = ConjunctiveGraph()
g = d.parse(data='<a:a> <b:b> <c:c> .', format='turtle')
print("After parse:")
for h in d.contexts(): print(h)
if g not in d.contexts():
    print("g has not been added to Dataset")

which yields

After parse:
[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory']].

Jun 08 '13 21:06 uholzer

@iherman : you contributed the dataset class - could you have a look at this?

Jun 15 '13 07:06 gromgull

I had a first look, though not coded yet (I hope to have time for it later). The fact is that the Dataset has no 'parse' method for the moment because we do not have any dataset syntaxes in RDFLib yet... But of course it is true that something ought to be done when a graph syntax is used, my bad.

But... it has to be decided what the outcome of

g = d.parse(data='<a:a> <b:b> <c:c> .', format='turtle', publicID=URIRef('g:g'))

is. I see two alternatives:

the triples are added to the default graph, and the publicID is ignored
publicID is used to either create a new graph on the fly, or identify a graph in the dataset if already there, and the triples are put in that graph

I am tempted to go for #1 (because turtle is NOT a dataset syntax but a graph syntax) but I can see value in #2.

That being said, what is really necessary is for someone to come up with a TriG parser bound to a Dataset...

Ivan

Gunnar Aastrand Grimnes wrote:

@iherman https://github.com/iherman : you contributed the dataset class - could you have a look at this?

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19492454.

Ivan Herman 4, rue Beauvallon, Clos St. Joseph 13090 Aix-en-Provence France tel: +31-64-1044153 ou +33 6 52 46 00 43 http://www.ivan-herman.net

Jun 17 '13 12:06 iherman

Actually... looking at this again. It is good to consider this with a fresh eye...

I believe the way the Dataset class was created was broken, its author should be fired:-)

Looking at this parsing issue, actually #2 below does make sense. I change my vote...
However: I actually do not see any reason why the Dataset class should inherit from a ConjunctiveGraph. I just realized that uholzer's remark on the default graph being a union is entirely justified, because that is what a conjunctive graph does: without a context on gets all the triples, and the Dataset inherited this behaviour. One could 'simulate' all this by creating a special context for the dataset's default graph but, frankly, it does not buy one anything.

What I propose is to re-route the Dataset as a subclass of Graph, with a local administration for other graphs in the datasets. The DEFAULT constant becomes, actually, unnecessary, because the dataset itself is also a reference to its default graph, but it does not harm to keep it there. The quad-like methods can remain and some should be added. The structure would become way simpler.

HOWEVER. I do not have a clear view on the way Conjunctive Graphs are stored in non-memory back ends. Would such a simplified view of Datasets create problems?

Thanks

Ivan

Ivan Herman wrote:

I had a first look, though not coded yet (I hope to have time for it later). The fact is that the Dataset has no 'parse' method for the moment because we do not have any dataset syntaxes in RDFLib yet... But of course it is true that something ought to be done when a graph syntax is used, my bad.

But... it has to be decided what the outcome of

g = d.parse(data='<a:a> <b:b> <c:c> .', format='turtle', publicID=URIRef('g:g'))

is. I see two alternatives:

the triples are added to the default graph, and the publicID is ignored

publicID is used to either create a new graph on the fly, or identify a graph in the dataset if already there, and the triples are put in that graph

I am tempted to go for #1 (because turtle is NOT a dataset syntax but a graph syntax) but I can see value in #2.

That being said, what is really necessary is for someone to come up with a TriG parser bound to a Dataset...

Ivan

Gunnar Aastrand Grimnes wrote:

@iherman https://github.com/iherman : you contributed the dataset class - could you have a look at this?

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19492454.

Ivan Herman 4, rue Beauvallon, Clos St. Joseph 13090 Aix-en-Provence France tel: +31-64-1044153 ou +33 6 52 46 00 43 http://www.ivan-herman.net

Jun 17 '13 12:06 iherman

Thanks Ivan. I am implementing a SPARQL endpoint using an in-memory store and the rdflib's SPARQL engine. Basically, I forward queries directly to the SPARQL engine. I'd like to allow the user (the one who sets up the endpoint) to choose whether the default graph is the union or a graph on its own. I also like empty graphs to be in the endpoint, ConjunctiveGraph does not show "empty" contexts. I am also implementing the Graph Store Protocol. Do you think using ConjunctiveGraph is the right way? Currently I am using Dataset.

Jun 17 '13 17:06 uholzer

I think... Dataset and ConjunctiveGraph have a lot in common, both should have quad methods, and way to access a single Graph ... the only two differences should be whether the default graph is the union or not, and whether empty graphs are tracked.

I think they should definitely have some sort of sub-class relationship, whether this is ConjunctiveGraph <= DataSet, or DataSet <= ConjuctiveGraph, or even ContextAwareGraph <= ( ConjunctiveGraph, DataSet )

Or maybe they could even be the same class, and we just have flags that control default graph behaviour?

The ConjunctiveGraph union trick is built into the context-aware store API (although somewhat implicitly) - I would rather not change the store-api at this stage, so the easiest way to store a non-union default graph is to use some magic constant identifier for this. (like Jena uses urn:x-arq:DefaultGraph)

Jun 18 '13 06:06 gromgull

Another thing:

The fact is that the Dataset has no 'parse' method for the moment because we do not have any dataset syntaxes in RDFLib yet...

That is not true - we have both a trix and a nquads parser. (and I've been whining about a trig parser for years :)

Unless, "DataSet syntax" means something else that I am not aware of.

Jun 18 '13 06:06 gromgull

Gunnar Aastrand Grimnes wrote:

Another thing:
The fact is that the Dataset has no 'parse' method for the moment because we
do not have any dataset syntaxes in RDFLib yet...
That is not true - we have both a trix and a nquads parser. (and I've been whining about a trig parser for years :)

Unless, "DataSet syntax" means something else that I am not aware of.

No, you are right. But that means that those parsers have to be re-written to combine them with Datasets. Something I did not do...

Ivan

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19593955.

Ivan Herman 4, rue Beauvallon, Clos St. Joseph 13090 Aix-en-Provence France tel: +31-64-1044153 ou +33 6 52 46 00 43 http://www.ivan-herman.net

Jun 18 '13 06:06 iherman

Gunnar Aastrand Grimnes wrote:

I think... Dataset and ConjunctiveGraph have a lot in common, both should have quad methods, and way to access a single Graph ... the only two differences should be whether the default graph is the union or not, and whether empty graphs are tracked.

I think they should definitely have some sort of sub-class relationship, whether this is |ConjunctiveGraph <= DataSet|, or |DataSet <= ConjuctiveGraph|, or even |ContextAwareGraph <= ( ConjunctiveGraph, DataSet )|

Or maybe they could even be the same class, and we just have flags that control default graph behaviour?

The ConjunctiveGraph union trick is built into the context-aware store API (although somewhat implicitly) - I would rather not change the store-api at this stage, so the easiest way to store a non-union default graph is to use some magic constant identifier for this. (like Jena uses |urn:x-arq:DefaultGraph|)

O.k., this is the information I was waiting for... So Datasets should be subclass of ConjunctiveGraph-s but there should be a separate mechanism for the default graph using the magic ID. However, this would also require to reproduce all methods of CG-s, which has not been done yet.

But there is a design issue/decision to take. What would be the result of:

g = Dataset() g.add(triple) # ie not quad!

I believe that would mean adding the triple to the default graph, ie, the one identified by Dataset.DEFAULT. Which is o.k., conceptually, but that means that the behaviour is fairly different from a ConjunctiveGraph where there is no such 'hidden' context feature...

I do not know when I can implement all this, though. Hopefully before July because if not, it will shift to August...

Ivan

Jun 18 '13 14:06 iherman

But a ConjunctiveGraph does have a (hidden) default context ConjunctiveGraph.default_context. ConjuntiveGraph.add adds a triple to it. This is the implementation:

    def add(self, (s, p, o)):
        """Add the triple to the default context"""
        self.store.add((s, p, o), context=self.default_context, quoted=False)

I think that it is best to implement the features of Dataset in ConjunctivGraph and add some flags, like gromgull suggested above. This also makes sense because users probably like to combine the features in different ways than the two that are available now (default graph is union and allow empty graphs, default graph is union and no empty graphs, default graph is not union but allow empty graphs, etc.)

Jun 19 '13 20:06 uholzer

Right.

I will have to spend some more time on the ConjunctiveGraph implementation but I see another problem. In terms of an RDF Dataset, each constituent is an RDF graph. The way it should be translated in RDFLib is that if I have a dataset ds, then ds.graph(ID) should return an rdflib.Graph instance. On the other hand, as far as I could see, the ConjunctiveGraph does not have this notion; if I fix a context, what I get is a set of triples, but not an rdflib.Graph instance. That is why the Dataset class has to have its own administration. Similarly, ds.default_context should be an rdflib.Graph. It is not at the moment...

If I forget about the storage implementations, the I still believe the clean approach would be to define and implement Dataset separately, but following the same interface as ConjunctiveGraphs (one of the previous proposals). The original error you raised, namely the fact that the default setting let to the union of all graphs, is an example where a behaviour was inherited in Dataset (because the implementer was careless and stupid) which was not intended...

Again, I hope to find time spending a little bit more time on the CG implementation to understand all the details...

Ivan

Urs Holzer wrote:

But a ConjunctiveGraph does have a (hidden) default context |ConjunctiveGraph.default_context|. |ConjuntiveGraph.add| adds a triple to it. This is the implementation:
def add(self, (s, p, o)):
    """Add the triple to the default context"""
    self.store.add((s, p, o), context=self.default_context, quoted=False)
I think that it is best to implement the features of Dataset in ConjunctivGraph and add some flags, like gromgull suggested above. This also makes sense because users probably like to combine the features in different ways than the two that are available now (default graph is union and allow empty graphs, default graph is union and no empty graphs, default graph is not union but allow empty graphs, etc.)

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19713982.

Ivan Herman 4, rue Beauvallon, Clos St. Joseph 13090 Aix-en-Provence France tel: +31-64-1044153 ou +33 6 52 46 00 43 http://www.ivan-herman.net

Jun 20 '13 04:06 iherman

On the other hand, as far as I could see, the ConjunctiveGraph does not have this notion; if I fix a context, what I get is a set of triples, but not an rdflib.Graph instance.

Maybe I don't undestand what you mean, but ConjunctiveGraph.get_context and ConjunctiveGraph.default_context return a Graph:

>>> g = ConjunctiveGraph()
>>> type(g.default_context)
<class 'rdflib.graph.Graph'>
>>> type(g.get_context(URIRef("example:foo")))
<class 'rdflib.graph.Graph'>

You get a fully working rdflib.Graph, albeit bound to the same store.

As far as I know, it is not possible to create a ConjunctiveGraph containing only a strict subset of the graphs of a store. This could be userful for Dataset, for example when the user specifies the Dataset in a SPARQL query using FROM and FROM NAMED. But this functionality is provided by rdflib.graph.ReadOnlyGraphAggregate. It still has the restriction that the graphs must be from the same store.

Jun 20 '13 09:06 uholzer

On the other hand, as far as I could see, the ConjunctiveGraph does not have this notion; if I fix a context, what I get is a set of triples, but not an rdflib.Graph instance.

No, if you call get_context you get a Graph instance back. It is regrettable that ConjuctiveGraph.get_context and DataSet.graph both exist :)

Jun 20 '13 10:06 gromgull

@uholzer : Mixing and matching graphs from different stores into one DataSet would be nice, but is awkward to implement, at least if you want efficient queries. I must admit I do not quite understand the use-cases for FROM and FROM NAMED... unless you allow you SPARQL engine to LOAD more data on the fly. Which also seems crazy.

Jun 20 '13 10:06 gromgull

Strike what I said:-) I spent some time earlier today and I realized I was wrong insofar as the Conjunctive Graph's contexts are, in fact, Graph instances, ie, they can be used. I think I know now what I have to do and will do it, eventually. Urs, thanks for drawing my attention on the CG's default context, b.t.w. I have to make use of that.

The Dataset has to be re-written anyway, because the RDF WG has, lately, removed the restriction whereby a graph's name cannot be a blank node. It can. The only complication is that the differentiation between the default graph/context and another one having a blank node is not that simple any more (in the earlier version the default graph was the only one having a blank node as a name); in what I have on my machine now the default graph/context is set up with a special (urn:rdflib:dsefault:UNIQUENUMBER) URI which, hopefully, no user will use (and the unique number helps us in that). That also means that serializers of datasets may then find out which of the graphs is the default.

I am not sure when I will have a new version ready an tested, though...

Ivan

Ivan Herman wrote:

Right.

I will have to spend some more time on the ConjunctiveGraph implementation but I see another problem. In terms of an RDF Dataset, each constituent is an RDF graph. The way it should be translated in RDFLib is that if I have a dataset ds, then ds.graph(ID) should return an rdflib.Graph instance. On the other hand, as far as I could see, the ConjunctiveGraph does not have this notion; if I fix a context, what I get is a set of triples, but not an rdflib.Graph instance. That is why the Dataset class has to have its own administration. Similarly, ds.default_context should be an rdflib.Graph. It is not at the moment...

If I forget about the storage implementations, the I still believe the clean approach would be to define and implement Dataset separately, but following the same interface as ConjunctiveGraphs (one of the previous proposals). The original error you raised, namely the fact that the default setting let to the union of all graphs, is an example where a behaviour was inherited in Dataset (because the implementer was careless and stupid) which was not intended...

Again, I hope to find time spending a little bit more time on the CG implementation to understand all the details...

Ivan

Urs Holzer wrote:
But a ConjunctiveGraph does have a (hidden) default context |ConjunctiveGraph.default_context|. |ConjuntiveGraph.add| adds a triple to it. This is the implementation:
def add(self, (s, p, o)):
    """Add the triple to the default context"""
    self.store.add((s, p, o), context=self.default_context, quoted=False)
I think that it is best to implement the features of Dataset in ConjunctivGraph and add some flags, like gromgull suggested above. This also makes sense because users probably like to combine the features in different ways than the two that are available now (default graph is union and allow empty graphs, default graph is union and no empty graphs, default graph is not union but allow empty graphs, etc.)

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19713982.

Ivan Herman 4, rue Beauvallon, Clos St. Joseph 13090 Aix-en-Provence France tel: +31-64-1044153 ou +33 6 52 46 00 43 http://www.ivan-herman.net

Jun 21 '13 06:06 iherman

I collected my thoughts on the dataset vs conjunctive graph in another ticket (#307)

The biggest problem I see is that the dataset has no way of persisting the list of graphs that exist. The dataset as implemented now both allows empty graphs to exist, and allows the store to contain triples in graphs that do not exist. This extra knowledge must be stored somewhere. The CG doesn't have this problem as it simply exposes the quads saved in the store.

Jun 21 '13 07:06 gromgull

Gunnar Aastrand Grimnes wrote:

I collected my thoughts on the dataset vs conjunctive graph in another ticket (#307)

The biggest problem I see is that the dataset has no way of persisting the list of graphs that exist. The dataset as implemented now both allows empty graphs to exist, and allows the store to contain triples in graphs that do not exist.

I am not sure what you mean. If the dataset is a subclass of CG, then contexts are graphs; if the user uses quads than a new graph pops into existence and that is fine. The dataset maintains a name->graph table which has to updated if such a new graph comes into the picture, so the list of graphs that do exist are maintained. The ds.graph() can also be used to create a new graph (empty or not, eventually).

Ie, I believe it can be made work. By using the default_context (aliased to default_graph in the Dataset class, probably) trick we can take care of the union issue. It takes some care, but I now believe this can all work.

The trig serializer will have to be updated to find the default graph properly; and I believe the trig parser will have to be updated to produce a dataset an not a CG (I hope to be able to do those, but I am not sure I will not run out of time before my vacations...).

I can share the initial code I did in a new version of the dataset, but I did not put into github yet because it is rough...

Ivan

This extra knowledge must be stored somewhere. The CG doesn't have this problem as it simply exposes the quads saved in the store.

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19802866.

Ivan Herman 4, rue Beauvallon, Clos St. Joseph 13090 Aix-en-Provence France tel: +31-64-1044153 ou +33 6 52 46 00 43 http://www.ivan-herman.net

Jun 21 '13 08:06 iherman

I am not sure what you mean. [...]The dataset maintains a name->graph table which has to updated if such a new graph comes into the picture, so the list of graphs that do exist are maintained.

This mapping dict is the problem, it is only stored in the dataset class so it won't be persisted. If I create a dataset on top of a sleepycat store, add some stuff, close it and reopen it later, the information about which graphs existed is lost.

Jun 21 '13 08:06 gromgull

Gunnar Aastrand Grimnes wrote:

I am not sure what you mean. [...]The dataset maintains a name->graph table which has to updated if such a new graph comes into the picture, so the list of graphs that do exist are maintained.

This mapping dict is the problem, it is only stored in the dataset class so it won't be persisted. If I create a dataset on top of a sleepycat store, add some stuff, close it and reopen it later, the information about which graphs existed is lost.

Ouch. You are right. Which may mean that all our storage plugins must be updated. Or we do some internal hack: we set up a separate context with some internal name that has this mapping encoded as RDF triples. It is hack that serializers have to know about...

Any other approach you can see?

Ivan

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19805106.

Ivan Herman 4, rue Beauvallon, Clos St. Joseph 13090 Aix-en-Provence France tel: +31-64-1044153 ou +33 6 52 46 00 43 http://www.ivan-herman.net

Jun 21 '13 09:06 iherman

Hm, things are getting more and more complicated:-( So I have another question.

Based on some of the comments on CG-s you guys convinced me that it is better to make a Dataset that is a subclass of CG. The main argument I heard is that back end stores are made 'context aware' and, by doing so, the Dataset implementation can make use of that.

However... what this means is that the 'graph' in the Dataset terminology should be mapped against the 'context' terminology of a CG. That also means that if I do a ds.add_quad(...), or a ds.addN(...), I can map that against either the corresponding CG method, or I can go directly to the store.

So far so good. However, from a user point of view, based on the RDF1.1 terminology, this is actually unnatural. A Dataset is a collection of graphs, and not a collection of quads. What is natural is, for example:

ds = Dataset() g1 = Dataset.graph('http://blabla') g1.add((s,p,o))

ie, that a user operates on Graphs. In terms of a CG, g1 is a context and, as I learned the other day, it is also a Graph instance. However, is it o.k. for the user to operate on graphs directly? Will the context aware store know that the 'add' operation happens on a context and do the right thing? Indeed, the interface of CG is made in such a way that only quads are supposed to be added, not addressing the individual graphs...

Am I worried too much? :-)

Ivan

Ivan Herman wrote:

Strike what I said:-) I spent some time earlier today and I realized I was wrong insofar as the Conjunctive Graph's contexts are, in fact, Graph instances, ie, they can be used. I think I know now what I have to do and will do it, eventually. Urs, thanks for drawing my attention on the CG's default context, b.t.w. I have to make use of that.

The Dataset has to be re-written anyway, because the RDF WG has, lately, removed the restriction whereby a graph's name cannot be a blank node. It can. The only complication is that the differentiation between the default graph/context and another one having a blank node is not that simple any more (in the earlier version the default graph was the only one having a blank node as a name); in what I have on my machine now the default graph/context is set up with a special (urn:rdflib:dsefault:UNIQUENUMBER) URI which, hopefully, no user will use (and the unique number helps us in that). That also means that serializers of datasets may then find out which of the graphs is the default.

I am not sure when I will have a new version ready an tested, though...

Ivan

Ivan Herman wrote:
Right.

I will have to spend some more time on the ConjunctiveGraph implementation but I see another problem. In terms of an RDF Dataset, each constituent is an RDF graph. The way it should be translated in RDFLib is that if I have a dataset ds, then ds.graph(ID) should return an rdflib.Graph instance. On the other hand, as far as I could see, the ConjunctiveGraph does not have this notion; if I fix a context, what I get is a set of triples, but not an rdflib.Graph instance. That is why the Dataset class has to have its own administration. Similarly, ds.default_context should be an rdflib.Graph. It is not at the moment...

If I forget about the storage implementations, the I still believe the clean approach would be to define and implement Dataset separately, but following the same interface as ConjunctiveGraphs (one of the previous proposals). The original error you raised, namely the fact that the default setting let to the union of all graphs, is an example where a behaviour was inherited in Dataset (because the implementer was careless and stupid) which was not intended...

Again, I hope to find time spending a little bit more time on the CG implementation to understand all the details...

Ivan

Urs Holzer wrote:
But a ConjunctiveGraph does have a (hidden) default context |ConjunctiveGraph.default_context|. |ConjuntiveGraph.add| adds a triple to it. This is the implementation:
def add(self, (s, p, o)):
    """Add the triple to the default context"""
    self.store.add((s, p, o), context=self.default_context, quoted=False)
I think that it is best to implement the features of Dataset in ConjunctivGraph and add some flags, like gromgull suggested above. This also makes sense because users probably like to combine the features in different ways than the two that are available now (default graph is union and allow empty graphs, default graph is union and no empty graphs, default graph is not union but allow empty graphs, etc.)

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19713982.

Ivan Herman, W3C Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 http://www.ivan-herman.net/foaf#me

Jun 25 '13 12:06 iherman

@iherman: No need to worry. Graph.add is implemented like this:

    def add(self, (s, p, o)):
        self.__store.add((s, p, o), self, quoted=False)

This means that Graph.add gives the store the triple to be added and also tells it to which context (self in this case) it has to be added. All operations on a Graph happen in its own context inside the store, hence the name 'context'. This is even supposed to work with Graph.query and Graph.update in a natural way.

Jun 25 '13 13:06 uholzer

You made my day:-) But better safe then sorry.

I am making little progress, unfortunately, but progress nevertheless. At the moment the only problematic issue (apart from debugging and such) is what to do with empty graphs stored in a DS

Thanks!

Ivan

Urs Holzer wrote:

@iherman https://github.com/iherman: No need to worry. Graph.add is implemented like this:

| def add(self, (s, p, o)): self.__store.add((s, p, o), self, quoted=False) |

This means that Graph.add gives the store the triple to be added and also tells it to which context (|self| in this case) it has to be added. All operations on a Graph happen in its own context inside the store, hence the name 'context'. This is even supposed to work with |Graph.query| and |Graph.update| in a natural way.

— Reply to this email directly or view it on GitHub https://github.com/RDFLib/rdflib/issues/301#issuecomment-19975368.

Ivan Herman 4, rue Beauvallon, Clos St. Joseph 13090 Aix-en-Provence France tel: +31-64-1044153 ou +33 6 52 46 00 43 http://www.ivan-herman.net

Jun 25 '13 13:06 iherman

Do we have an update on this after I merged #309?

In https://github.com/RDFLib/rdflib/commit/6c026d0922a392efdc1434122c081a383d9c415f#L0R1597 I added a small fix for DataSet, in the future I plan to fix up the Parser interface, so that it is also possible to return more than one graph when parsing, so that parsing trig/trix etc. also should work correctly. Probably rolled in with #283

Aug 11 '13 18:08 gromgull

Executing

from rdflib import Dataset, URIRef
d = Dataset()
g = d.parse(data='<a:a> <b:b> <c:c> .', format='turtle', publicID=URIRef('g:g'))
print("After parse:")
for h in d.contexts(): print(h)
if g not in d.contexts(): print("g not in contexts")
for h in d.contexts(): print(h in d.contexts())

in python2.7 and python3.2 yields

After parse:
<g:g> a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].
<urn:x-rdflib:default> a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].
g not in contexts
True
True

Although that's better, I don't understand why g not in d.contexts().

Aug 12 '13 18:08 uholzer

++ I don't understand why g not in d.contexts().

That's because Dataset.parse() doesn't return a graph and so g is None:

>>> from rdflib import Dataset, URIRef
>>> d = Dataset()
>>> g = d.parse(data='<a:a> <b:b> <c:c> .', format='turtle', publicID=URIRef('g:g'))
>>> assert g is not None
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

Adding an explicit return to https://github.com/RDFLib/rdflib/blob/master/rdflib/graph.py#L1597 fixes the issue:

>>> from rdflib import Dataset, URIRef
>>> d = Dataset()
>>> g = d.parse(data='<a:a> <b:b> <c:c> .', format='turtle', publicID=URIRef('g:g'))
>>> assert g is not None
>>> print("After parse:")
After parse:
>>> for h in d.contexts():
...     print("Context", h)
... 
Context <g:g> a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].
Context <urn:x-rdflib:default> a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].
>>> print("g", g)
g <g:g> a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].
>>> if g not in d.contexts():
...     print("g not in contexts")
... 
>>> for h in d.contexts():
...     print(h in d.contexts())
... 
True
True

Nov 08 '13 02:11 ghost

Am I right in assuming that the only bit missing of this is that if you parse some context-aware rdf format into a dataset, and your input file contains several graphs, they may not all appear? If so, we should close and make a new issue for that. With less comments :)

Dec 30 '13 20:12 gromgull

Is this still current?

May 15 '21 02:05 aucampia

I don't think this is all relevant anymore, but there are definitely gotchas that are not super obvious until you explore them fully (especially with the removal of publicId from dataset.parse as of rdflib=7.0.

I put together a GIST for some of the gotchas. Happy to extend with additional examples (e.g. stores) if people are interested. https://gist.github.com/zwelz3/a861c8c6961d4335763f389b63ec8a90

Feb 23 '24 17:02 zwelz3

rdflib rdflib copied to clipboard

Dataset.parse broken

rdflib
rdflib copied to clipboard