rfc3986 Host ']'

The following malformed URL is accepted by rfc3986:

B://]

Although the character ']' is allowed in a host, it must be in the context of an IPv6 or an IPvFuture, which this is not.

This malformed URL is rejected by urllib, urllib3, hyperlink, yarl, furl, and Boost.URL.

Feb 07 '23 15:02 kenballus

Also a problem when host is '['

Feb 07 '23 22:02 kenballus

Happy to accept a fix here

Feb 15 '23 13:02 sigmavirus24

The question here is whether the validation should or should not be implicit when using uri_reference. The truth is that it parses the URI and returns an invalid result by default:

In [1]: from rfc3986 import uri_reference

In [2]: uri_reference("B://]")
Out[2]: URIReference(scheme='B', authority=']', path=None, query=None, fragment=None)

But you can simply check the validity of the result by the is_valid() method:

In [3]: _.is_valid()
Out[3]: False

If you want the validation to happen immediately during parsing, you can use ParseResult:

In [4]: from rfc3986 import ParseResult

In [5]: ParseResult.from_string("B://]")
…
InvalidAuthority: The authority (]) is not valid.

So, because the lib is already able to say that the URL is invalid, it's more a design decision if the validation should happen by default.

Apr 18 '23 10:04 frenzymadness

Pretty sure uri_reference was always intended to be non-validating so that the rest could be API compatible with urllib3 more or less.

Maybe we need to make that clearer in the docs

Apr 18 '23 13:04 sigmavirus24

@kenballus could you please clarify which approach did you use?

Apr 18 '23 13:04 frenzymadness

My mistake; using ParseResult.from_string fixes this. I should have read the docs more thoroughly.

Why is it that we have a non-validating parser? Some large projects (httpx) use this parser, and would probably be better off using ParseResult.from_string.

Apr 19 '23 16:04 kenballus