packages-semweb icon indicating copy to clipboard operation
packages-semweb copied to clipboard

`rdf_load/[1,2]' changes the request IRI the user supplies

Open wouterbeek opened this issue 9 years ago • 3 comments

rdf_load/[1,2] performs IRI normalization before sending an HTTP request. IRI normalization introduces unnecessary percent escaping that is not supported by all servers, occasionally resulting in unsuccessful requests.

Reproducible case:

?- [library(semweb/rdf_db)].
?- [library(semweb/rdf_http_plugin)].
?- rdf_load('http://dbpedia.org/resource/Category:Politics').
% Parsed "http://dbpedia.org/resource/Category%3APolitics" in 0.00 sec; 0 triples
true.

If you visit http://dbpedia.org/resource/Category:Politics then you see that there are triples there.

wouterbeek avatar Jan 24 '16 16:01 wouterbeek

Great. In a previous rounds, we decided that : must be escaped to avoid relative URIs to be read as absolute ones. The above makes it really hard when you can/must escape. rdf_load escapes to allow it processing the unescaped IRIs on the triples ...

JanWielemaker avatar Jan 24 '16 16:01 JanWielemaker

I'm not clear on the benefit of escaping : in places where this is not required. The only benefit that I can think of is processing speed, since the syntax for relative IRIs is recognizably different than the one for absolute IRIs.

wouterbeek avatar Jan 24 '16 17:01 wouterbeek

It is rather odd. RFC3986 indeed allows for ":" in a path segment. However, if you have a relative url, using a ":" in (the first) path segment causes it to become ambiguous (it can also be read as an absolute url). This problem was raised by Samer a while ago and caused the decision to escape the ":". Looking at JavaScript, we get

> encodeURIComponent("aap:noot")
"aap%3Anoot"
> encodeURI("http://www.example.com/aap:noot")
"http://www.example.com/aap:noot"

I'm a little lost :(

JanWielemaker avatar Jan 24 '16 20:01 JanWielemaker