virtuoso-opensource
virtuoso-opensource copied to clipboard
Incorrect IRI handling in fct when doing content negotiation
I use IRIs in my RDF and fct to browse and do content negotiation, e.g. for text/turtle
.
However, there is an issue with inconsistent handling of unicode characters in Location:
header in the HTTP redirect.
When I do:
curl -i -H "Accept: text/turtle" https://linked.opendata.cz/resource/knowledge-graph-browser/view/uk/nadřazená-pracoviště
I get:
location: https://linked.opendata.cz/sparql?query=define%20sql%3Adescribe-mode%20%22CBD%22%20%20DESCRIBE%20%3Chttps%3A%2F%2Flinked.opendata.cz%2Fresource%2Fknowledge-graph-browser%2Fview%2Fuk%2Fnad%C5%99azená-pracovi%C5%A1t%C4%9B%3E&format=text%2Fturtle
Note the á
there - all the unicode characters are percent encoded, but not á
. This causes problems with libraries expecting ASCII string, such as those implementing the fetch API, e.g. https://www.npmjs.com/package/node-fetch
There is an nginx reverse proxy on the way doing:
location /resource/ {
include hsts-cors.conf;
proxy_pass http://127.0.0.1:8890/describe/?url=https://linked.opendata.cz$uri;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_pass_request_headers on;
proxy_redirect http://linked.opendata.cz https://linked.opendata.cz ;
sub_filter_once off;
sub_filter 'href="http://linked.opendata.cz' 'href="https://linked.opendata.cz';
sub_filter 'src="http://linked.opendata.cz' 'src="https://linked.opendata.cz';
}
When I tried tunneling to the server to avoid it, I got even worse result:
curl -i -H "Accept: text/turtle" "http://localhost:8890/describe/?url=https://linked.opendata.cz/resource/knowledge-graph-browser/view/uk/nadřazená-pracoviště"
HTTP/1.1 303 See Other
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8
Date: Thu, 14 Jul 2022 06:45:56 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Location: http://localhost:8890/sparql?query=define%20sql%3Adescribe-mode%20%22CBD%22%20%20DESCRIBE%20%3Chttps%3A%2F%2Flinked.opendata.cz%2Fresource%2Fknowledge-graph-browser%2Fview%2Fuk%2Fnadrazen%3F-pracovi%3Fte%3E&format=text%2Fturtle
The IRI here url-decoded is:
https://linked.opendata.cz/resource/knowledge-graph-browser/view/uk/nadrazen?-pracovi?te
- ř
becomes r
, and é
and š
are replaced by ?
.
Thanks for the report. We will have a look at what is going on there.