rdflib icon indicating copy to clipboard operation
rdflib copied to clipboard

Graph parse method overrides prefix bindings

Open edmondchuc opened this issue 3 years ago • 1 comments

If I run the following code:

from rdflib import Graph, Namespace

data = """
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ns: <https://example.com/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

ns:08429fce-4d70-4be4-9c64-ffc80f554ea7
    a skos:Concept .
"""

EX = Namespace("https://example.com/")

graph = Graph()

graph.bind("ex", EX)
graph.parse(data=data, format="turtle")

graph.print(format="turtle")

It will print:

@prefix ns: <https://example.com/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

ns:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept ;
    skos:definition "definition" ;
    skos:prefLabel "label" .

I would have expected the bind to persist through the life of the graph object and print the following result:

@prefix ex: <https://example.com/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

ex:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept ;
    skos:definition "definition" ;
    skos:prefLabel "label" .

If I swap the two lines from:

graph.bind("ex", EX)
graph.parse(data=data, format="turtle")

to:

graph.parse(data=data, format="turtle")
graph.bind("ex", EX)

it then prints what I expect.

Is this the expected behaviour where calling the parse() method overwrites prefix bindings in the graph's namespace manager?

edmondchuc avatar Jun 22 '22 06:06 edmondchuc

Is this the expected behaviour where calling the parse() method overwrites prefix bindings in the graph's namespace manager?

It is according to my discoveries when working through testing the override/replace interactions and I included an observation on the matter:

https://github.com/RDFLib/rdflib/blob/05dced203f7db28470255ce847db6b38d05a2663/test/test_graph/test_namespace_rebinding.py#L113

which is intended to find its way into the documentation.

I have a sense that the domain modelling (of the source as a serialized RDF Graph) is slightly more faithfully represented in the traditional RDFLib invocation idiom:

g = Graph().parse(data=source, format=format)

There doesn't seem to be any elegant and straightforward way of handling prefix-namespace bindings in a format-independent manner. The turtle parser doesn't use either override or replace: https://github.com/RDFLib/rdflib/blob/05dced203f7db28470255ce847db6b38d05a2663/rdflib/plugins/parsers/notation3.py#L1946

Also consider:

def test_parse_namespace():
    data = """
    PREFIX dcterms: <http://purl.org/dc/terms/>
    PREFIX ns: <https://example.com/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX schema: <https://schema.org/>
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

    ns:08429fce-4d70-4be4-9c64-ffc80f554ea7
        a skos:Concept .
    """

    EX = Namespace("https://example.com/")

    graph = Graph()

    graph.parse(data=data, format="turtle")
    graph.bind("ex", EX)

    assert graph.serialize(format="turtle") == (
        "@prefix ex: <https://example.com/> .\n"
        "@prefix skos: <http://www.w3.org/2004/02/skos/core#> .\n"
        "\n"
        "ex:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept .\n"
        "\n"
    )

    graph2 = Graph()
    graph2 += graph  # Namespace bindings in graph not preserved

    assert graph2.serialize(format="turtle") == (
        "\n"
        "<https://example.com/08429fce-4d70-4be4-9c64-ffc80f554ea7> a "
        "<http://www.w3.org/2004/02/skos/core#Concept> .\n"
        "\n"
    )

    graph2 = Graph()
    graph2.bind("xe", EX)

    graph2 += graph  # Namespace bindings in graph2 preserved

    assert graph2.serialize(format="turtle") == (
        "@prefix xe: <https://example.com/> .\n"
        "\n"
        "xe:08429fce-4d70-4be4-9c64-ffc80f554ea7 a "
        "<http://www.w3.org/2004/02/skos/core#Concept> .\n"
        "\n"
    )

ghost avatar Jun 22 '22 09:06 ghost

I think your expectaion is reasonable @edmondchuc - but changing this will be a breaking change, so should be targeted for 7.x, see https://github.com/RDFLib/rdflib/pull/2108 for some options.

aucampia avatar Mar 20 '23 21:03 aucampia