rdflib.js
rdflib.js copied to clipboard
Merging graphs with blank nodes
I'm using rdflib in a project to update an existing graph with new data. This new data contains a bit overhead, because it is possible a certain 'device1' already exists, but when merging with the old graph they have the same IRI's and thus mean the same. We are using blank nodes to represent different measurements. Old graph:
@prefix ns0: <https://florsanders.inrupt.net/public/ontologies/omalwm2m.ttl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://basisLeshan.com/device1/3303/0>
a ns0:ObjectInstance;
ns0:consistsOf [
a ns0:5700, ns0:ResourceInstance ;
ns0:hasTimeStamp "2020-04-16T08:24:03.755Z"^^xsd:dateTime ;
ns0:hasValue "-2.5"^^xsd:float ;
ns0:organizedInto <http://basisLeshan.com/device1/3303/0>
] ;
ns0:containedBy <http://basisLeshan.com/device1> .
<http://basisLeshan.com/device1>
a ns0:Device ;
ns0:contains <http://basisLeshan.com/device1/3303/0> .
new Graph:
@prefix ns0: <https://florsanders.inrupt.net/public/ontologies/omalwm2m.ttl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://basisLeshan.com/device1/3303/0>
a ns0:ObjectInstance;
ns0:consistsOf [
a ns0:5700, ns0:ResourceInstance ;
ns0:hasTimeStamp "2020-04-16T08:26:32.988Z"^^xsd:dateTime ;
ns0:hasValue "-5.1"^^xsd:float ;
ns0:organizedInto <http://basisLeshan.com/device1/3303/0>
] ;
ns0:containedBy <http://basisLeshan.com/device1> .
<http://basisLeshan.com/device1>
a ns0:Device ;
ns0:contains <http://basisLeshan.com/device1/3303/0> .
(The only difference is a different value & timestamp)
When I would merge them, I would expect these 2 blank nodes to kept seperate, because there is nothing that could suggest otherwise. This is also the behavior of rdflib in python.
Thus:
#!/usr/bin/env python3
from rdflib import Graph
# https://rdflib.readthedocs.io/en/stable/merging.html
g = Graph()
g.parse('old_graph.ttl', format='turtle')
g.parse('new_graph.ttl', format='turtle')
g.serialize('out.ttl', format='turtle')
gives:
@prefix ns0: <https://florsanders.inrupt.net/public/ontologies/omalwm2m.ttl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://basisLeshan.com/device1> a ns0:Device ;
ns0:contains <http://basisLeshan.com/device1/3303/0> .
<http://basisLeshan.com/device1/3303/0> a ns0:ObjectInstance ;
ns0:consistsOf [ a <https://florsanders.inrupt.net/public/ontologies/omalwm2m.ttl#5700>,
ns0:ResourceInstance ;
ns0:hasTimeStamp "2020-04-16T08:24:03.755000+00:00"^^xsd:dateTime ;
ns0:hasValue "-2.5"^^xsd:float ;
ns0:organizedInto <http://basisLeshan.com/device1/3303/0> ],
[ a <https://florsanders.inrupt.net/public/ontologies/omalwm2m.ttl#5700>,
ns0:ResourceInstance ;
ns0:hasTimeStamp "2020-04-16T08:26:32.988000+00:00"^^xsd:dateTime ;
ns0:hasValue "-5.1"^^xsd:float ;
ns0:organizedInto <http://basisLeshan.com/device1/3303/0> ] ;
ns0:containedBy <http://basisLeshan.com/device1> .
With 2 separate blank nodes, like expected, also according to the specs (if I understand them correct)
Implementations that handle blank node identifiers in concrete syntaxes need to be careful not to create the same blank node from multiple occurrences of the same blank node identifier except in situations where this is supported by the syntax.
When I use rdflib.js however, these blank nodes get mangled into 1:
const $rdf = require('rdflib');
const store = $rdf.graph();
$rdf.parse(old_graph, store, 'https://www.example.com/', 'text/turtle');
$rdf.parse(new_graph, store, 'https://www.example.com/', 'text/turtle');
console.log($rdf.serialize(null, store, 'https://www.example.com/', 'text/turtle'));
Like you see here:
@prefix : <#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix b: <http://basisLeshan.com/>.
@prefix om: <https://florsanders.inrupt.net/public/ontologies/omalwm2m.ttl#>.
@prefix n0: <http://basisLeshan.com/device1/3303/>.
b:thijs-Galago-Pro
a om:Device; om:contains n0:0.
n0:0
a om:ObjectInstance;
om:consistsOf
[
a om:5700, om:ResourceInstance;
om:hasTimeStamp
"2020-04-16T08:24:03.755Z"^^xsd:dateTime,
"2020-04-16T08:26:32.988Z"^^xsd:dateTime;
om:hasValue "-2.5"^^xsd:float, "-5.1"^^xsd:float;
om:organizedInto n0:0
];
om:containedBy b:device1.
This would (to my understanding and expectations) not be according the RDF specs? So, is this a bug or is there another way this should be done? If there is anything that can clarify my question, ask me!
Maybe worse: if I say the old and new graph come from different documents (see code)
const $rdf = require('rdflib');
const store = $rdf.graph();
$rdf.parse(old_graph, store, 'https://www.example.com/1', 'text/turtle');
$rdf.parse(new_graph, store, 'https://www.example.com/2', 'text/turtle');
console.log($rdf.serialize(null, store, 'https://www.example.com/', 'text/turtle'));
I get this:
@prefix : </#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix b: <http://basisLeshan.com/>.
@prefix om: <https://florsanders.inrupt.net/public/ontologies/omalwm2m.ttl#>.
@prefix n0: <http://basisLeshan.com/device1/3303/>.
b:device1
a om:Device, om:Device;
om:contains n0:0, n0:0.
n0:0
a om:ObjectInstance, om:ObjectInstance;
om:consistsOf _:_g_L5C354, _:_g_L5C354;
om:containedBy b:device1, b:device1.
_:_g_L5C354
a om:5700, om:5700, om:ResourceInstance, om:ResourceInstance;
om:hasTimeStamp
"2020-04-16T08:24:03.755Z"^^xsd:dateTime,
"2020-04-16T08:26:32.988Z"^^xsd:dateTime;
om:hasValue "-2.5"^^xsd:float, "-5.1"^^xsd:float;
om:organizedInto n0:0, n0:0.
This means that: all triples are doubled (see for ex. b:device1 a om:Device, om:Device;
)
But the blank node is still merged!! I think this is not how it should be?
I'm hitting the same problem. What is the best way to work around this?
This updating in BlankNode appears to be the culprit:
https://github.com/linkeddata/rdflib.js/blob/c14dfd57d5159ad5ac1ee2523cc7924968e24f53/src/blank-node.ts#L35
I think that the abstract nextId
counter is not synchronised across class instances when data is loaded asynchronously?