IPv6 hosts are not correctly represented in the URI module
When working on this issue:
https://github.com/gleam-lang/httpc/issues/34
and the ensuing discussion on Discord:
https://discord.com/channels/768594524158427167/1440650512319119490
@Nicd identified that the current behaviour of the stdlib uri module is not correct when parsing IPv6 addresses that are given in the URI.
let assert Ok(u) = uri.parse("http://[2600:1406:bc00:53::b81e:94c8]")
let us = uri.to_string(u)
echo u
echo us
Produces:
Uri(Some("http"), None, Some("2600:1406:bc00:53::b81e:94c8"), None, "", None, None)
"http://2600:1406:bc00:53::b81e:94c8/"
When the expected output should be:
Uri(Some("http"), None, Some("[2600:1406:bc00:53::b81e:94c8]"), None, "", None, None)
"http://[2600:1406:bc00:53::b81e:94c8]/"
(Note the brackets around the IPv6 addresses.)
Looking at this further I found:
https://github.com/erlang/otp/issues/4731
Where the decision was made that, since the parsed URI is "...not RFC compliant..." then there was no need to include the brackets around the IP address. This means that any downstream uses of the erlang stdlib uri_string:parse/1 would need to post-process the result in some way to re-add the brackets.
inets/httpc itself does this here:
https://github.com/erlang/otp/blob/0a2443402f36388ff067fb08f763096e15d444c8/lib/inets/src/http_lib/http_util.erl#L202-L217
and called here:
https://github.com/erlang/otp/blob/0a2443402f36388ff067fb08f763096e15d444c8/lib/inets/src/http_client/httpc.erl#L1193
and here:
https://github.com/erlang/otp/blob/0a2443402f36388ff067fb08f763096e15d444c8/lib/inets/src/http_client/httpc_response.erl#L409
So, the question is: Should the gleam stdlib follow the erlang decision and punt the bracketing of parsed IPv6 addresses downstream to consuming applications, or should it follow the RFC-style and maintain the brackets in the host component of a parsed URI?
Personally, I don't really agree with the erlang decision, and think that the bracketed version of an IPv6 URI is correct, for two reasons:
- Bracketing the IPv6 address is universally supported when used in any URI context, which is to say that it generally does not cause problems to HAVE the brackets, but it does cause problems when it does not.
- Re-implementing the same logic in downstream applications is likely to cause problems with correct handling of these URIs -- such as naive double bracketing "[[...]]", problems with host / port differentiation:
2600:1406:bc00:53::b81e:94c8:8000and so on.
But that's not my call. I can also see the reason for sticking with the erlang implementation for consistency with the larger implementation.
It's also worth mentioning #523 here, since the implementation of the parsing setup may change soonish? I tried that, but it failed completely. (I opened an issue: https://github.com/pendletong/uri/issues/1)
I agree, the Erlang behaviour is incorrect. Let's fix it.
Is the behaviour you're reporting on both targets or just one?
Copying a message from Discord:
notes about JS: http://[::1]/wibble works correctly: https://playground.gleam.run/#N4IgbgpgTgzglgewHYgFwEYA0IDGyAuES+aIcAtgA4JT4AEA5gDYQCG5A9AK5RwA6SCtVqMW7DlAgwuTfAIGUuAIzoAzJHXKs4SABQBKOsAF06EHAAsEdHnAB0lVrAi6+IC/nyVUHDgG1UDABdDgB3OCUlFjdDAB8APjpJaVk7LUpdWzt8BAB9GHxeJAZ9AQBfARAyoA
but "http://[2600:1406:bc00:53::b81e:94c8]" does not even parse: https://playground.gleam.run/#N4IgbgpgTgzglgewHYgFwEYA0IDGyAuES+aIcAtgA4JT4AEA5gDYQCG5A9AK5RwA6SCtVqMW7DlAgwuTfAIGUuAIzoAzJHXKs4SABQBKOsAF06EHAAsEdHnAB0lVrAi6+IC/nyVUHDgG0AJgA2AAYQjAAWEKDUJRww1ABWAGZUWIAOdAhUAE4InHSAXTdDAB8APjpJaVk7LUpdWzt8BAB9GHxeJAZ9AQBfARA+oA
So on JS it seems to work correctly, when it parses the address. But some valid addresses are not parsed.
If I do this in the playground:
import gleam/uri
pub fn main() {
let assert Ok(u) = uri.parse("http://[::1]/wibble")
echo u
echo uri.to_string(u)
}
I get:
Uri(scheme: Some("http"), userinfo: None, host: Some("[::1]"), port: None, path: "/wibble", query: None, fragment: None)
"http://[::1]/wibble"
Changing the URL to "http://[2600:1406:bc00:53::b81e:94c8]/wobble" indeed does not even parse.
So I think the JS target is "RFC compliant" in the first instance (it keeps the brackets, so that would be consistent with fixing it) but completely broken in the second instance.
Should this be opened as another issue?
Tracking that here is fine too.
I'm new to this, so forgive me if I get this wrong.
Currently, the stdlib has a FFI to the Erlang URI parse here:
https://github.com/gleam-lang/stdlib/blob/main/src/gleam/uri.gleam#L81
However, it also implements this function in "pure" Gleam, which I think? is what is used by JavaScript as a fallback?
However, JavaScript has a URI.parse() method as well: https://developer.mozilla.org/en-US/docs/Web/API/URL/parse_static
Trying this in the browser seems to work OK?
> URL.parse("http://[2600:1406:bc00:53::b81e:94c8]/foo/bar")
URL {origin: 'http://[2600:1406:bc00:53::b81e:94c8]', protocol: 'http:', username: '', password: '', host: '[2600:1406:bc00:53::b81e:94c8]', …}
Can this JS implementation be added as another FFI? If that's the case, does the Gleam implementation need to stay?