Allow percent encoding in URI host
According to RFC 3986 section 3.2.2 https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.2 , the host section of the authority of an url is allowed to carry percent-encoded characters.
The parsing code only allowed % in ip adresses, and userinfo, which is both needlessly restrictive, and allows some invalid urls if there isn't two hex characters after the percent character.
This PR allows percent encoding everywhere in the host, and checks if the percent-encoding is valid.
This is useful for instance to implements Unix domain sockets URI, that would look like unix://<percent encoded socket path>/<http request path>
Torn on whether this is a good thing for the crate, here's some facts though:
- rfc3986 defines a generic URL syntax that allows percent encoding in the authority
- rfc7230 references rfc3986 section 3.2 without additional stipulations
httparseallows % in URIs but of course this is map is the sum of what is allowed in all the parts; that's to say thathttparsedoesn't try to distinguish- the WHATWG URL standard seems to disallow percent encoding in domain parts but not in the authority as a whole
Given these, and without requiring an up to date public suffix list referenced by the WHATWG standard, it feels like allowing percent encoding as this PR does would be largely standard compliant.
Edit: having some doubts after reading more (research, curl PR, curl mailing list), it seems we'd need to actually do the percent decoding (sometimes?) to prevent this feature turning into a security risk