http-client icon indicating copy to clipboard operation
http-client copied to clipboard

Redirection URI isn't always correctly escaped

Open k0ral opened this issue 3 years ago • 2 comments

tl;dr

This is an edge case, probably not worth fixing, but worth raising awareness about.

When this happens

When the following conditions hold:

Example

HTTP/1.1 302 Found
Content-Type: text/html; charset=utf-8
Location: http://example.com/path/to/file[3].html

The Location URI has unescaped square brackets outside of their reserved semantic (delimiters for an IP literal within host.)

What should happen vs what actually happens

Redirection should be followed by http-client, but it is actually not: the 3** response is returned as-is.

Why

I believe the issue is in Network.HTTP.Client.Response.getRedirectedRequest function, specifically at the following line:

let l = escapeURIString isAllowedInURI (S8.unpack l')

This will never escape reserved characters, whereas in some cases such characters should be escaped. Indeed, reserved characters are allowed in URIs, but only in very specific situations.

For example, square brackets should be escaped if, and only if, they are used as delimiters for an IP literal within host:

  • http://[1080::8:800:200C:417A]/foo is valid
  • http://example.com/path/to/file[3].html is invalid and should instead be escaped as http://example.com/path/to/file%5B3%5D.html

When this happens, l is an invalid URI, and the subsequent parseURIReference l (at this line) returns Nothing, effectively breaking the redirection.

What to do about it (my 2 cents)

This is clearly an edge case, I don't expect many users are / will be impacted by this issue (I am entertaining the idea that I'm the first observer, but I should stop kidding myself :) ). Fixing the issue seems tricky to me: we need a "smart" escaping function that is aware of all the intricacies of URI reserved characters -- and even then, I expect some cases would end up in a guessing game with no correct answer. Thus, unless you can think of a simple solution, the benefits/cost ratio seems very low to me, and I would consider it perfectly fine to keep the current code as-is -- but then it might be worth mentioning this limitation somewhere in the documentation.

k0ral avatar Jun 05 '21 21:06 k0ral

I looked at the relevant parts of the spec, e.g. https://datatracker.ietf.org/doc/html/rfc7231#section-7.1.2, and didn't find any comments about this. Do you have any spec references or demonstrations of what other libraries do here?

snoyberg avatar Jun 06 '21 02:06 snoyberg

I haven't checked what other libraries in other languages do -- I'll update this thread if I find the time to investigate further.

On second thought, I am starting to believe that a clean solution should involve replacing the "escape, then parse" part into "escape while parsing", since proper escaping depends on context that only a URI parser is aware of.

k0ral avatar Jun 06 '21 10:06 k0ral