Wikidata-Toolkit icon indicating copy to clipboard operation
Wikidata-Toolkit copied to clipboard

Illegal characters in URLs in NTriple export

Open mkroetzsch opened this issue 10 years ago • 1 comments

The NTriples export contains URLs that use the character "|", which is not allowed in NTriples. For example, the export contains the following triple:

<http://www.wikidata.org/entity/R4a809378d29bcd83ef35faaed4415cfa> <http://www.wikidata.org/entity/P854r> <http://viaf.org/processed/BNC|a10176597> .

It is based on the reference given for the "Gran Enciclopèdia Catalana ID" of Virgil

It needs to be investigated why OpenRDF is failing to encode this properly, and what can be done to avoid this. Also, there might be further issues (a user suggested that "{" also occurs somewhere in a URL). If OpenRDF does not work as expected here, it might actually be easier to implement the NTriples grammar directly instead of using OpenRDF in this case. Another possibility is that we are somehow using OpenRDF wrongly.

mkroetzsch avatar Mar 28 '15 15:03 mkroetzsch

@w013ccw Is this the issue with character encoding you were referring to? Do you know about additional encoding issues?

mkroetzsch avatar Mar 28 '15 15:03 mkroetzsch