rdflib-jsonld
rdflib-jsonld copied to clipboard
Not supporting @base with relative URI reference value
On MacOS, using Python 2.7.5 with:
rdflib==4.2.1
rdflib-jsonld==0.3
The following file parses and returns the expected triples:
{ "@id": "http://example.com/sampledata/testsite/_annalist_site/../"
, "@base": "http://example.com/sampledata/testsite/_annalist_site/./"
, "@context": "site_context.jsonld"
, "rdfs:label": "Annalist data notebook test site"
, "rdfs:comment": "Annalist test site metadata"
, "annal:comment": "Initialized from `sampledata/init/annalist_site/_annalist_site/site_meta.jsonld`"
}
But this file parses OK but returns no triples:
{ "@id": "../"
, "@base": "http://example.com/sampledata/testsite/_annalist_site/./"
, "@context": "site_context.jsonld"
, "rdfs:label": "Annalist data notebook test site"
, "rdfs:comment": "Annalist test site metadata"
, "annal:comment": "Initialized from `sampledata/init/annalist_site/_annalist_site/site_meta.jsonld`"
}
The context file I'm using is this:
{
"@context": {
"@base": ".",
"annal": "http://purl.org/annalist/#",
"owl": "http://www.w3.org/2002/07/owl#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#"
}
}
In case it's relevant, the code I'm using to do the parsing test looks like this:
from rdflib import Graph
g = Graph()
s = self.testsite._read_stream()
b = "file://" + os.path.join(TestBaseDir, layout.SITEDATA_DIR) + "/"
result = g.parse(source=s, publicID=b, format="json-ld")
print "***** g:"
print(g.serialize(format='turtle', indent=4))
(The line s = ... returns a file-like object returning my purported JSON-LD data.)
This is important for me as I'm trying to create "portable" directory structures that can be interpreted as RDF using the current base URI of the data. My reading of the JSON-LD spec is that relative references should be allowed for "@id" as long as there is an active base URI. (I also want the @base to be relative to the document location, but that's a separate concern.)
After peeking at the source code, it turns out that if I put an absolute base URI in the context file, triples are generated. This works for me, but I think it's contrary to the JSON-LD spec that (I think) says @base values in external contexts should be ignored.
I.e. these files generate the triples I'm trying to represent:
Source:
{ "@id": "../"
, "@base": "./"
, "@context": "site_context.jsonld"
, "rdfs:label": "Annalist data notebook test site"
, "rdfs:comment": "Annalist test site metadata"
, "annal:comment": "Initialized from `sampledata/init/annalist_site/_annalist_site/site_meta.jsonld`"
}
Context:
{
"@context": {
"@base": "http://example.com/sampledata/data/annalist_site/_annalist_site/",
"annal": "http://purl.org/annalist/#",
"owl": "http://www.w3.org/2002/07/owl#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#"
}
}
Looking at the JSON-LD parser code, my hypothesis is that the problem is around _to_rdf_id in parser.py, specifically the call of context.resolve(id_val):
def _to_rdf_id(self, context, id_val):
bid = self._get_bnodeid(id_val)
if bid:
return BNode(bid)
else:
uri = context.resolve(id_val)
if not self.generalized_rdf and ':' not in uri:
return None
return URIRef(uri)
It appears to me that the relative resolution is performed entirely by the context object which (I guess) does not have access to any @base specified in the current node, and which does not appear to make any distinction between internal and external contexts.
You are correct that the context class doesn't distinguish properly between internal and external contexts and that it should ignore @base in external ones. This is wrong and should be fixed.
Note though that @base needs to be defined in a @context (according to JSON-LD 1.0, section 6.1 Base IRI). Thus, if you move @base into the @context (in an object of an array) in your original example, like:
"@context": [ "site_context.jsonld" ,
{"@base": "http://example.com/sampledata/testsite/_annalist_site/./"} ]
I believe your usage will work and be fully compliant.
@niklasl: Oops, thanks! I had got it stuck in my head that @base could be applied directly to an element as well as in a context. That explains the unexpected results from my initial test case. I'll need to fix that in my code.
OK, I've tried putting the @base in an internal context value as suggested. If I use an absolute URI for the base value, all is well, but if I use a relative URI I'm back to the original problem of getting no triples returned. I am passing the desired base URI to the parser via the publicID= parameter.
The following source document works:
{ "@id": "../"
, "@context": [ "site_context.jsonld", {"@base": "file:///usr/workspace/github/gklyne/annalist/src/annalist_root/sampledata/data/annalist_site/_annalist_site/./"} ]
, "rdfs:label": "Annalist data notebook test site"
, "rdfs:comment": "Annalist test site metadata"
, "annal:comment": "Initialized from `sampledata/init/annalist_site/_annalist_site/site_meta.jsonld`"
}
with context:
{
"@context": {
"@base": "./",
"annal": "http://purl.org/annalist/2014/#",
"owl": "http://www.w3.org/2002/07/owl#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#"
}
}
and code:
g = Graph()
s = self.testsite._read_stream()
b = "file://" + os.path.join(TestBaseDir, layout.SITEDATA_DIR) + "/"
print "*****"+repr(b)
result = g.parse(source=s, publicID=b, format="json-ld")
print(g.serialize(format='turtle', indent=4))
Unfortunately, want I really want to avoid doing is putting the absolute URI in the source file. If I change it as follows, I get no triples:
{ "@id": "../"
, "@context": [ "site_context.jsonld", {"@base": "./"} ]
, "rdfs:label": "Annalist data notebook test site"
, "rdfs:comment": "Annalist test site metadata"
, "annal:comment": "Initialized from `sampledata/init/annalist_site/_annalist_site/site_meta.jsonld`"
}
UPDATE: the only way I can get the desired effect is to not include any base URI in the source file, and have an absolute base URI in the external context file.
I'm finding that the handling of external @base values is causing some inconsistencies, which I'm pretty sure could lead to interoperability failures.
The problem is that the supplied "publicID" value is used as a base for resolving the @context reference, but the @base within the context file is used for resolving the relative @id URI reference in the JSON-LD source file testcoll/d/testtype/entity1/entity-data.jsonld. This means that there are effectively two different base URIs being used for different purposes in the same JSON-LD file.
As before, I can't use a local (internal) @base with a relative reference.
If I drop the @base from the context file, then the supplied publicID value is used consistently throughout.
I've isolated some code and data that illustrates the issue at https://github.com/gklyne/annalist/tree/develop/spike/jsonld_context. The small python module there assumes rdflib and rdflib-jsonld are installed, and assumes the testcoll directory tree is present in the same directory as the Python source code. The key data files are testcoll/d/testtype/entity1/entity-data.jsonld and testcoll/d/coll_context.jsonld.
@gklyne I have now pushed some fixes and new tests for context handling (mainly 9978eac75 and f8870158). Do you have any opportunity to test your use case using the latest from master? (I ran your example code with these changes applied and it seemed to behave as expected, after adding {"@base": "../.."} to the local context.)
If things behave as expected, I'll incorporate this in a new release shortly.
Thanks... I'll take a look when I have opportunity to re-introduce @base into my code and tests (for now, my workaround is to avoid use of @base, but that doesn't work perfectly for every case). If your patch works for the isolated sample code, I think that's a good start.
On reflection, maybe the more useful thing for me to do is expand my "spike" code to include a range of test cases that I think I should be able to use, and see how that plays out?
If you have the time, that'd be a great way to check the implementation, as well as our comparative understandings and needs. (Most things are governed by the spec, but some might be controllable using parameters.)
@niklasl - Thanks! I've added that to my TODO list.
I'm not promising a timeframe, but I'm using this library for testing my JSON-LD generating code, so getting an interoperable (per spec) interpretation of @base is a significant want for me. Hopefully some time in the next couple of weeks.
Well, it took more than a "couple of weeks", but I think the latest fixes are working as expected, with one point I'd like to double-check:
From file https://github.com/gklyne/annalist/blob/develop/spike/jsonld_context/testsite/c/testcoll/_annalist_collection/types/testtype/type_meta.jsonld in my test case:
{ "@context": [
{ "@base": "../../" },
"../../coll_context.jsonld"
],
"@id": "types/testtype/",
"@type": [ "annal:Type" ],
:
}
The reference to the external context file is resolved against the supplied PublicID, not against the value of @base. I think this may be correct per spec (http://www.w3.org/TR/json-ld-api/, section 6.1), but I find the spec isn't clear on this point (and the main JSON-LD syntax spec appears to be completely silent on the issue :( ).
My updated stand-alone spike code can be seen at: https://github.com/gklyne/annalist/tree/develop/spike/jsonld_context. It's basically a copy of some of my internal tests, reworked to isolate the code and data from the rest of the software system. The actual test module to run is read_jsonld.py
The git status/git log output below shows the version of rdflib-jsonld I'm currently using for my tests:
(rdflibenv)conina:rdflib-jsonld graham$ git status
# On branch masternothing to commit, working directory clean
(rdflibenv)conina:rdflib-jsonld graham$ git log
commit 3563caf75295d9ee382b54fb2f812c6c3edffee3
Author: Niklas Lindström <[email protected]>
Date: Sat Nov 28 12:00:40 2015 +0100
Add entry points under "application/ld+json" for parser and serializer
Fixes #36
commit 0f802ce8f8b9f5397bf3e7510c5a773bd6c632a3
Author: Niklas Lindström <[email protected]>
Date: Sat Nov 28 11:55:31 2015 +0100
Adapt version number to semver scheme
commit 8e922cbab0cca0fbf1ffef888c70c0e9824f8704
Author: Niklas Lindström <[email protected]>
Date: Mon Nov 2 14:13:31 2015 +0100
Change Term class to namedtuple
commit f8870158277d3aa96202ea5319a4a9a8b7eb8948
Author: Niklas Lindström <[email protected]>
Date: Mon Nov 2 14:02:26 2015 +0100
Resolve relative base in context against document base
BTW, is there a way to include comment strings in an external JSONLD context? Mine are dynamically generated, and I'd like to be able to include a timestamp, etc. I guess I could define a fake namespace prefix, though I guess it also has to be a valid URI syntax, e.g.
"context_comment": "comment:20151204:1230:created_by_foo"
[later] Actually, I think I've figured this: I think that anything outside the @context key value in an external context document is ignored. I've put my comment there and nothing seems to be breaking.
Ignore this comment - the problem does not appear in my spike code.
The spike code works fine under the new jsonld library revision, but I was testing my full software under the current release version, which is where I'm seeing the failure. I'm leaving the original report here for reference, but I also note that it appears to work in the updated software version.
I think I've just uncovered another problem with @base URI handling:
"@base": "/testsite/c/testcoll/d/"
does not work (parsing succeeds but returns an empty graph), but:
"@base": "http://example.org/testsite/c/testcoll/d/"
does work. I'm expecting the @base value to be resolved against the supplied public URI, which seems to work if it is a relative path, but in this case I have an absolute path that I want to use with the same scheme and authority parts as the supplied public URI.