stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

IPv6 hosts are not correctly represented in the URI module

Open ahankinson opened this issue 3 months ago • 6 comments

When working on this issue:

https://github.com/gleam-lang/httpc/issues/34

and the ensuing discussion on Discord:

https://discord.com/channels/768594524158427167/1440650512319119490

@Nicd identified that the current behaviour of the stdlib uri module is not correct when parsing IPv6 addresses that are given in the URI.

  let assert Ok(u) = uri.parse("http://[2600:1406:bc00:53::b81e:94c8]")
  let us = uri.to_string(u)
  echo u
  echo us

Produces:

Uri(Some("http"), None, Some("2600:1406:bc00:53::b81e:94c8"), None, "", None, None)
"http://2600:1406:bc00:53::b81e:94c8/"

When the expected output should be:

Uri(Some("http"), None, Some("[2600:1406:bc00:53::b81e:94c8]"), None, "", None, None)
"http://[2600:1406:bc00:53::b81e:94c8]/"

(Note the brackets around the IPv6 addresses.)

ahankinson avatar Nov 19 '25 13:11 ahankinson

Looking at this further I found:

https://github.com/erlang/otp/issues/4731

Where the decision was made that, since the parsed URI is "...not RFC compliant..." then there was no need to include the brackets around the IP address. This means that any downstream uses of the erlang stdlib uri_string:parse/1 would need to post-process the result in some way to re-add the brackets.

inets/httpc itself does this here:

https://github.com/erlang/otp/blob/0a2443402f36388ff067fb08f763096e15d444c8/lib/inets/src/http_lib/http_util.erl#L202-L217

and called here:

https://github.com/erlang/otp/blob/0a2443402f36388ff067fb08f763096e15d444c8/lib/inets/src/http_client/httpc.erl#L1193

and here:

https://github.com/erlang/otp/blob/0a2443402f36388ff067fb08f763096e15d444c8/lib/inets/src/http_client/httpc_response.erl#L409

So, the question is: Should the gleam stdlib follow the erlang decision and punt the bracketing of parsed IPv6 addresses downstream to consuming applications, or should it follow the RFC-style and maintain the brackets in the host component of a parsed URI?

Personally, I don't really agree with the erlang decision, and think that the bracketed version of an IPv6 URI is correct, for two reasons:

  1. Bracketing the IPv6 address is universally supported when used in any URI context, which is to say that it generally does not cause problems to HAVE the brackets, but it does cause problems when it does not.
  2. Re-implementing the same logic in downstream applications is likely to cause problems with correct handling of these URIs -- such as naive double bracketing "[[...]]", problems with host / port differentiation: 2600:1406:bc00:53::b81e:94c8:8000 and so on.

But that's not my call. I can also see the reason for sticking with the erlang implementation for consistency with the larger implementation.


It's also worth mentioning #523 here, since the implementation of the parsing setup may change soonish? I tried that, but it failed completely. (I opened an issue: https://github.com/pendletong/uri/issues/1)

ahankinson avatar Nov 20 '25 10:11 ahankinson

I agree, the Erlang behaviour is incorrect. Let's fix it.

Is the behaviour you're reporting on both targets or just one?

lpil avatar Nov 20 '25 11:11 lpil

Copying a message from Discord:

notes about JS: http://[::1]/wibble works correctly: https://playground.gleam.run/#N4IgbgpgTgzglgewHYgFwEYA0IDGyAuES+aIcAtgA4JT4AEA5gDYQCG5A9AK5RwA6SCtVqMW7DlAgwuTfAIGUuAIzoAzJHXKs4SABQBKOsAF06EHAAsEdHnAB0lVrAi6+IC/nyVUHDgG1UDABdDgB3OCUlFjdDAB8APjpJaVk7LUpdWzt8BAB9GHxeJAZ9AQBfARAyoA

but "http://[2600:1406:bc00:53::b81e:94c8]" does not even parse: https://playground.gleam.run/#N4IgbgpgTgzglgewHYgFwEYA0IDGyAuES+aIcAtgA4JT4AEA5gDYQCG5A9AK5RwA6SCtVqMW7DlAgwuTfAIGUuAIzoAzJHXKs4SABQBKOsAF06EHAAsEdHnAB0lVrAi6+IC/nyVUHDgG0AJgA2AAYQjAAWEKDUJRww1ABWAGZUWIAOdAhUAE4InHSAXTdDAB8APjpJaVk7LUpdWzt8BAB9GHxeJAZ9AQBfARA+oA

So on JS it seems to work correctly, when it parses the address. But some valid addresses are not parsed.

Nicd avatar Nov 20 '25 11:11 Nicd

If I do this in the playground:

import gleam/uri

pub fn main() {
  let assert Ok(u) = uri.parse("http://[::1]/wibble")
  echo u
  echo uri.to_string(u)
}

I get:

Uri(scheme: Some("http"), userinfo: None, host: Some("[::1]"), port: None, path: "/wibble", query: None, fragment: None)
"http://[::1]/wibble"

Changing the URL to "http://[2600:1406:bc00:53::b81e:94c8]/wobble" indeed does not even parse.

So I think the JS target is "RFC compliant" in the first instance (it keeps the brackets, so that would be consistent with fixing it) but completely broken in the second instance.

Should this be opened as another issue?

ahankinson avatar Nov 20 '25 12:11 ahankinson

Tracking that here is fine too.

lpil avatar Nov 24 '25 12:11 lpil

I'm new to this, so forgive me if I get this wrong.

Currently, the stdlib has a FFI to the Erlang URI parse here:

https://github.com/gleam-lang/stdlib/blob/main/src/gleam/uri.gleam#L81

However, it also implements this function in "pure" Gleam, which I think? is what is used by JavaScript as a fallback?

However, JavaScript has a URI.parse() method as well: https://developer.mozilla.org/en-US/docs/Web/API/URL/parse_static

Trying this in the browser seems to work OK?

> URL.parse("http://[2600:1406:bc00:53::b81e:94c8]/foo/bar")
      URL {origin: 'http://[2600:1406:bc00:53::b81e:94c8]', protocol: 'http:', username: '', password: '', host: '[2600:1406:bc00:53::b81e:94c8]', …}

Can this JS implementation be added as another FFI? If that's the case, does the Gleam implementation need to stay?

ahankinson avatar Nov 24 '25 17:11 ahankinson