rdflib.js icon indicating copy to clipboard operation
rdflib.js copied to clipboard

Turtle serializer shouldn't write blank nodes as <...>

Open fennibay opened this issue 2 years ago • 8 comments

I'm converting JSON-LD to Turtle using rdflib.js.

Example input:

{
    "@context": {
        "ex": "http://example.com#",
    },
    "@id": "ex:myid",
    "ex:prop1": {
        "ex:prop2": {
            "ex:prop3": "value",
        },
    },
}

Example current output out of rdflib.js:

@prefix ex: <http://example.com#>.

<_:b0> ex:prop2 <_:b1>.
<_:b1> ex:prop3 "value".
ex:myid ex:prop1 <_:b0>.

Turtle spec states following:

RDF blank nodes in Turtle are expressed as _: followed by a blank node label which is a series of name characters.

So, I think blank nodes should be expressed without <...>, because this makes them absolute or relative IRIs and not blank nodes.

As an additional feature, it would be nice to be able to control the blank node output to have them nested or not nested.

Questions:

  1. Is this a known issue? I saw some non-conformances in #329, but couldn't find this exact case there.
  2. Could this be affected by arguments? In case I'm calling the functions wrong? I'm including below my code snippet.
/**
 * Convert JSON-LD to Turtle
 * @param input JSON string
 * @param base Base IRI for the content
 * @param namespaces The namespace map for use in ttl
 * @returns TTL string
 */
async function convertJsonLdToTtl(
    input: string,
    base: string,
    namespaces: Record<string, string> = {},
): Promise<string> {
    return new Promise<string>((res, rej) => {
        const store = rdflib.graph()
        rdflib.parse(input, store, base, "application/ld+json", (err, kb) => {
            if (err) {
                rej(err)
            } else {
                if (!kb) {
                    rej("KB empty: " + kb)
                } else {
                    console.log("KB # statements: " + kb.statements.length)
                    rdflib.serialize(
                        null,
                        kb,
                        undefined,
                        "text/turtle",
                        (err, output) => {
                            if (err) {
                                rej(err)
                            } else {
                                if (!output) {
                                    rej("Empty output: " + output)
                                } else {
                                    res(output)
                                }
                            }
                        },
                        {
                            namespaces,
                        },
                    )
                }
            }
        })
    })
}

Many thanks.

fennibay avatar Apr 13 '22 08:04 fennibay

I can confirm that <_:b0> is a NamedNode, not a BlankNode in Turtle. So this looks like a bug.

jeff-zucker avatar Apr 13 '22 22:04 jeff-zucker

Agreed. The issue may be in JSON-LD parser and not in turtle serializer.

bourgeoa avatar May 03 '22 17:05 bourgeoa

Agreed. The issue may be in JSON-LD parser and not in turtle serializer.

Thx for the hint. So I tried to first convert from JSON-LD to N-Quads (with another library, jsonld) and then convert to Turtle. Which helped by embedding the blank nodes. So the blank node labels may still be wrong, I couldn't test this, but my problem is solved for now.

fennibay avatar May 14 '22 19:05 fennibay

This is rather problematic for any system that uses rdflib.js to parse JSON-LD. Any chance this can get prioritized?

RinkeHoekstra avatar Jul 18 '22 09:07 RinkeHoekstra

I can confirm that e.g. the following JSON-LD is not parsed correctly:

{
    "@context": {
        "@vocab": "https://example.com/"
    },
    "hasExampleProperty": "some literal value"
}

Results in the following statement (I'm using an example IRI for the graph here):

{
    "subject": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "_:b0"
    },
    "predicate": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/hasExampleProperty"
    },
    "object": {
        "termType": "Literal",
        "classOrder": 1,
        "value": "some literal value",
        "datatype": {
            "termType": "NamedNode",
            "classOrder": 5,
            "value": "http://www.w3.org/2001/XMLSchema#string"
        },
        "isVar": 0,
        "language": ""
    },
    "graph": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/test/"
    }
}

But clearly _:b0 should be a BlankNode.

Whereas the corresponding Turtle, is parsed correctly:

@prefix ex: <https://example.com/> .

[] ex:hasExampleProperty "some literal value" .

Becomes:

{
    "subject": {
        "termType": "BlankNode",
        "classOrder": 6,
        "value": "_g_L2C39",
        "isBlank": 1,
        "isVar": 1
    },
    "predicate": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/hasExampleProperty"
    },
    "object": {
        "termType": "Literal",
        "classOrder": 1,
        "value": "some literal value",
        "datatype": {
            "termType": "NamedNode",
            "classOrder": 5,
            "value": "http://www.w3.org/2001/XMLSchema#string"
        },
        "isVar": 0,
        "language": ""
    },
    "graph": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/test/"
    }
}

(Interestingly, the blank node gets a completely different internal identifier in this case).

RinkeHoekstra avatar Jul 18 '22 10:07 RinkeHoekstra

When the JSON-LD contains a list, the blank nodes corresponding to that collection are generated correctly:

{
    "@context": {
        "@vocab": "https://example.com/",
        "hasExampleProperty": {
            "@container": "@list"
        }
    },
    "hasExampleProperty": ["some literal value", "some other literal value"]
}

As N-Quads:

_:n4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "some other literal value".
_:n4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nill>.
_:n5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "some literal value".
_:n5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:n4.
<_:b0> <https://example.com/hasExampleProperty> _:n5 <https://example.com/test/> .

RinkeHoekstra avatar Jul 18 '22 10:07 RinkeHoekstra

The function jsonldObjectToTerm does not appear to ever return a BlankNode

https://github.com/linkeddata/rdflib.js/blob/c14dfd57d5159ad5ac1ee2523cc7924968e24f53/src/jsonldparser.js#L11

RinkeHoekstra avatar Jul 18 '22 10:07 RinkeHoekstra

Diagnosis

It looks like the flatten function from jsonld.js is the culprit.

The JSON-LD parser takes the flattened output, and checks for @id attributes to determine whether the JSON object represents a blank node or not.

https://github.com/linkeddata/rdflib.js/blob/c14dfd57d5159ad5ac1ee2523cc7924968e24f53/src/jsonldparser.js#L68-L83

and:

https://github.com/linkeddata/rdflib.js/blob/c14dfd57d5159ad5ac1ee2523cc7924968e24f53/src/jsonldparser.js#L24-L26

However, the jsonld.js flattened output inserts @id attributes, e.g. the above JSON-LD (without the list) results in:

[
  {
    "@id": "_:b0",
    "https://example.com/hasExampleProperty": [
      {
        "@value": "some literal value"
      }
    ]
  }
]

This turns the node into a NamedNode because it has an @id attribute.

The @id attribute is a non-normative part of the JSON-LD specification at https://www.w3.org/TR/json-ld11/#identifying-blank-nodes.

The flattened output (also non-normative) uses this in its examples: https://www.w3.org/TR/json-ld11/#flattened-document-form (and it needs to as it cannot use nesting to group the properties of the node together).

Proposed Solution

  • Do not rely on the presence of an @id attribute, as it will always be there for named and blank nodes.
  • Use the standard syntax for blank nodes in JSON-LD to identify whether a JSON object is a blank node: any value of @id that starts with _: is a blank node.

RinkeHoekstra avatar Jul 18 '22 10:07 RinkeHoekstra