Graph parse method overrides prefix bindings
If I run the following code:
from rdflib import Graph, Namespace
data = """
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ns: <https://example.com/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
ns:08429fce-4d70-4be4-9c64-ffc80f554ea7
a skos:Concept .
"""
EX = Namespace("https://example.com/")
graph = Graph()
graph.bind("ex", EX)
graph.parse(data=data, format="turtle")
graph.print(format="turtle")
It will print:
@prefix ns: <https://example.com/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
ns:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept ;
skos:definition "definition" ;
skos:prefLabel "label" .
I would have expected the bind to persist through the life of the graph object and print the following result:
@prefix ex: <https://example.com/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
ex:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept ;
skos:definition "definition" ;
skos:prefLabel "label" .
If I swap the two lines from:
graph.bind("ex", EX)
graph.parse(data=data, format="turtle")
to:
graph.parse(data=data, format="turtle")
graph.bind("ex", EX)
it then prints what I expect.
Is this the expected behaviour where calling the parse() method overwrites prefix bindings in the graph's namespace manager?
Is this the expected behaviour where calling the
parse()method overwrites prefix bindings in the graph's namespace manager?
It is according to my discoveries when working through testing the override/replace interactions and I included an observation on the matter:
https://github.com/RDFLib/rdflib/blob/05dced203f7db28470255ce847db6b38d05a2663/test/test_graph/test_namespace_rebinding.py#L113
which is intended to find its way into the documentation.
I have a sense that the domain modelling (of the source as a serialized RDF Graph) is slightly more faithfully represented in the traditional RDFLib invocation idiom:
g = Graph().parse(data=source, format=format)
There doesn't seem to be any elegant and straightforward way of handling prefix-namespace bindings in a format-independent manner. The turtle parser doesn't use either override or replace: https://github.com/RDFLib/rdflib/blob/05dced203f7db28470255ce847db6b38d05a2663/rdflib/plugins/parsers/notation3.py#L1946
Also consider:
def test_parse_namespace():
data = """
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ns: <https://example.com/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
ns:08429fce-4d70-4be4-9c64-ffc80f554ea7
a skos:Concept .
"""
EX = Namespace("https://example.com/")
graph = Graph()
graph.parse(data=data, format="turtle")
graph.bind("ex", EX)
assert graph.serialize(format="turtle") == (
"@prefix ex: <https://example.com/> .\n"
"@prefix skos: <http://www.w3.org/2004/02/skos/core#> .\n"
"\n"
"ex:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept .\n"
"\n"
)
graph2 = Graph()
graph2 += graph # Namespace bindings in graph not preserved
assert graph2.serialize(format="turtle") == (
"\n"
"<https://example.com/08429fce-4d70-4be4-9c64-ffc80f554ea7> a "
"<http://www.w3.org/2004/02/skos/core#Concept> .\n"
"\n"
)
graph2 = Graph()
graph2.bind("xe", EX)
graph2 += graph # Namespace bindings in graph2 preserved
assert graph2.serialize(format="turtle") == (
"@prefix xe: <https://example.com/> .\n"
"\n"
"xe:08429fce-4d70-4be4-9c64-ffc80f554ea7 a "
"<http://www.w3.org/2004/02/skos/core#Concept> .\n"
"\n"
)
I think your expectaion is reasonable @edmondchuc - but changing this will be a breaking change, so should be targeted for 7.x, see https://github.com/RDFLib/rdflib/pull/2108 for some options.