hdt-cpp
hdt-cpp copied to clipboard
Literal with ^@ results in error
Hi,
When I try to convert the following Turtle file I get an error:
@prefix : <https://example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
:1 a :Project;
rdfs:label "Escala^@"@es;
:startYear "2015"^^xsd:gYear .
The error that I receive is
Input format not given. Guessing from file extension...
Detected RDF input format: ttl
Catch exception load: ERROR: Could not convert triple to IDS!
https://example.org/1 http://www.w3.org/2000/01/rdf-schema#label "Escala
ERROR: ERROR: Could not convert triple to IDS!
https://example.org/1 http://www.w3.org/2000/01/rdf-schema#label "Escala
I used the latest version of the develop
branch and executed
./rdf2hdt /tmp/input.ttl /tmp/test.hdt -v
Interesting to note is that when removing the year from the data the error does not appear.
Can you try with Serd? What's your Serd version?
My Serd version is 0.30.2.
When executing the following
serdi input.ttl
I get
<https://example.org/1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://example.org/Project> .
<https://example.org/1> <http://www.w3.org/2000/01/rdf-schema#label> "Escala\u0000"@es .
<https://example.org/1> <https://example.org/startYear> "2015"^^<http://www.w3.org/2001/XMLSchema#gYear> .
So the ^@
gets converted to \u0000
, so it looks like something goes wrong here, no?
IIt seems the default parser in HDT is somehow activated, which does not take this case into account. However, I don't understand why Serd is not used here, although it could or maybe this parsing happens after.
At least, things go wrong here: https://github.com/rdfhdt/hdt-cpp/blob/develop/libhdt/src/dictionary/PlainDictionary.cpp#L108 and here https://github.com/rdfhdt/hdt-cpp/blob/develop/libhdt/include/SingleTriple.hpp#L276
The Object is not inserted succesfully in the dictionary, but fails silently. Afterwards, when encountering the incomplete triple, the error is thrown.