http icon indicating copy to clipboard operation
http copied to clipboard

Allow percent encoding in URI host

Open paullgdc opened this issue 3 years ago • 1 comments

According to RFC 3986 section 3.2.2 https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.2 , the host section of the authority of an url is allowed to carry percent-encoded characters.

The parsing code only allowed % in ip adresses, and userinfo, which is both needlessly restrictive, and allows some invalid urls if there isn't two hex characters after the percent character.

This PR allows percent encoding everywhere in the host, and checks if the percent-encoding is valid.

This is useful for instance to implements Unix domain sockets URI, that would look like unix://<percent encoded socket path>/<http request path>

paullgdc avatar Mar 01 '22 18:03 paullgdc

Torn on whether this is a good thing for the crate, here's some facts though:

Given these, and without requiring an up to date public suffix list referenced by the WHATWG standard, it feels like allowing percent encoding as this PR does would be largely standard compliant.

Edit: having some doubts after reading more (research, curl PR, curl mailing list), it seems we'd need to actually do the percent decoding (sometimes?) to prevent this feature turning into a security risk

robjtede avatar May 01 '22 03:05 robjtede