virtuoso-opensource icon indicating copy to clipboard operation
virtuoso-opensource copied to clipboard

Incorrect IRI handling in fct when doing content negotiation

Open jakubklimek opened this issue 2 years ago • 1 comments

I use IRIs in my RDF and fct to browse and do content negotiation, e.g. for text/turtle. However, there is an issue with inconsistent handling of unicode characters in Location: header in the HTTP redirect. When I do: curl -i -H "Accept: text/turtle" https://linked.opendata.cz/resource/knowledge-graph-browser/view/uk/nadřazená-pracoviště I get: location: https://linked.opendata.cz/sparql?query=define%20sql%3Adescribe-mode%20%22CBD%22%20%20DESCRIBE%20%3Chttps%3A%2F%2Flinked.opendata.cz%2Fresource%2Fknowledge-graph-browser%2Fview%2Fuk%2Fnad%C5%99azená-pracovi%C5%A1t%C4%9B%3E&format=text%2Fturtle Note the á there - all the unicode characters are percent encoded, but not á. This causes problems with libraries expecting ASCII string, such as those implementing the fetch API, e.g. https://www.npmjs.com/package/node-fetch

There is an nginx reverse proxy on the way doing:

 location /resource/ {
                include hsts-cors.conf;
                proxy_pass http://127.0.0.1:8890/describe/?url=https://linked.opendata.cz$uri;
                proxy_set_header   Host             $host;
                proxy_set_header   X-Real-IP        $remote_addr;
                proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
                proxy_pass_request_headers      on;
                proxy_redirect http://linked.opendata.cz https://linked.opendata.cz ;

                sub_filter_once off;
                sub_filter 'href="http://linked.opendata.cz' 'href="https://linked.opendata.cz';
                sub_filter 'src="http://linked.opendata.cz' 'src="https://linked.opendata.cz';
        }

When I tried tunneling to the server to avoid it, I got even worse result:

curl -i -H "Accept: text/turtle" "http://localhost:8890/describe/?url=https://linked.opendata.cz/resource/knowledge-graph-browser/view/uk/nadřazená-pracoviště"
HTTP/1.1 303 See Other
Server: Virtuoso/07.20.3233 (Linux) x86_64-pc-linux-gnu
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8
Date: Thu, 14 Jul 2022 06:45:56 GMT
Accept-Ranges: bytes
TCN: choice
Vary: negotiate,accept
Location: http://localhost:8890/sparql?query=define%20sql%3Adescribe-mode%20%22CBD%22%20%20DESCRIBE%20%3Chttps%3A%2F%2Flinked.opendata.cz%2Fresource%2Fknowledge-graph-browser%2Fview%2Fuk%2Fnadrazen%3F-pracovi%3Fte%3E&format=text%2Fturtle

The IRI here url-decoded is: https://linked.opendata.cz/resource/knowledge-graph-browser/view/uk/nadrazen?-pracovi?te - ř becomes r, and é and š are replaced by ?.

jakubklimek avatar Jul 14 '22 06:07 jakubklimek

Thanks for the report. We will have a look at what is going on there.

pkleef avatar Jul 14 '22 09:07 pkleef