rdflib-jsonld
rdflib-jsonld copied to clipboard
Valid json+ld produces an empty graph
from rdflib import Namespace, Graph, RDF, XSD, URIRef, plugin, Literal
from rdflib.serializer import Serializer
js = """{"@context":"https://schema.org","@graph":[{"@type":"Organization","@id":"https://example.com/#organization","url":"https://example.com/","name":"Move Ahead","sameAs":[]},{"@type":"WebSite","@id":"https://example.com/#website","url":"https://example.com/","name":"Move Ahead","publisher":{"@id":"https://example.com/#organization"},"potentialAction":{"@type":"SearchAction","target":"https://example.com/?s={search_term_string}","query-input":"required name=search_term_string"}},{"@type":"CollectionPage","@id":"https://example.com/category/lease/#collectionpage","url":"https://example.com/category/lease/","inLanguage":"en-US","name":"Leasing","isPartOf":{"@id":"https://example.com/#website"},"description":"Reallybig Leasing Co. A leading global transportation services provider"}]}"""
g = Graph().parse(data=js,format='json-ld')
len(g)
and the length is zero.
If I use for item in json.loads(js).get('@graph'):
I can build a graph, but it doesn't resolve properly, we are missing connected data and the rdf:type is missing the context:
>>> for item in json.loads(js).get('@graph'):
... g += Graph().parse(data=json.dumps(item),format='json-ld')
...
>>> for row in g.query("SELECT * where {?s ?p ?o}"):
... print(row)
...
(rdflib.term.URIRef(u'https://example.com/#website'), rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef(u'file:///home/teledyn/Work/WebSite'))
(rdflib.term.URIRef(u'https://example.com/#organization'), rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef(u'file:///home/teledyn/Work/Organization'))
(rdflib.term.URIRef(u'https://example.com/category/lease/#collectionpage'), rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef(u'file:///home/teledyn/Work/CollectionPage'))
which clearly isn't going to work ... if I inspect the item
members of the graph, the missing information is nested:
{u'url': u'https://example.com/', u'sameAs': [], u'@id': u'https://example.com/#organization', u'@type': u'Organization', u'name': u'Move Ahead'}
{u'publisher': {u'@id': u'https://example.com/#organization'}, u'potentialAction': {u'query-input': u'required name=search_term_string', u'@type': u'SearchAction', u'target': u'https://example.com/?s={search_term_string}'}, u'name': u'Move Ahead', u'url': u'https://example.com/', u'@id': u'https://example.com/#website', u'@type': u'WebSite'}
{u'inLanguage': u'en-US', u'name': u'Leasing', u'url': u'https://example.com/category/lease/', u'isPartOf': {u'@id': u'https://example.com/#website'}, u'@id': u'https://example.com/category/lease/#collectionpage', u'@type': u'CollectionPage', u'description': u'Reallybig Leasing Co. A leading global transportation services provider'}
Is there something I am missing here? This application reads json+ld found in the wild, so I can't control the input, but is there be some reliable way to massage the input so that it would work with the rdflib parser?
The json+ld works fine for Google
I found my work-around: if I copy the @context into each of the @graph items and parse them one at a time, then combine the results, it resolves as expected
I had the same issue when writing tests: the json-ld that was generated by the rdflib-jsonld serializer could not be directly parsed with the rdflib-jsonld parser.
It seems I get this problem only when the @graph
key is in the json object.