database
database copied to clipboard
Incorrect default encoding (ISO-8859-1) assumed when submitting SPARQL query as POST request
According to the SPARQL standard https://www.w3.org/TR/sparql11-protocol/#query-via-post-direct
the encoding of the data must be UTF-8
Blazegraph uses the getReader
method:
https://github.com/blazegraph/database/blob/bc439f9d6c37bb4a1d33878b2054853714d5d9a9/bigdata-core/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java#L919-L922
which defaults to ISO-8859-1:
https://github.com/apache/tomcat/blob/7c0dd42ac4e9533d73d4ba50791ab2dda9d79760/java/org/apache/coyote/Constants.java#L30
This causes charset to break with the following query:
curl -H "Content-Type: application/sparql-query" -d "SELECT ?x { BIND('Curaçao' As ?x) }" https://query.wikidata.org/sparql
For example, this problem occurs when Jena wants to query Wikidata from a SPARQL SERVICE clause, see https://github.com/apache/jena/issues/1259#issuecomment-1100607544
It is most likely also causing Issue #206