url icon indicating copy to clipboard operation
url copied to clipboard

Encode ^ in pathname

Open TimothyGu opened this issue 4 years ago • 7 comments

u = new URL('http://abc.com/a^b');
console.log(u.pathname);

This gives "a%5Eb" in Chrome and Firefox, in addition to Go and Node.js. Ruby's URI fails to parse the URL with ^, but is fine with %5E. However, the spec and Safari don't escape ^ at the moment.

Shall we escape ^ in paths? This will cause U+005E (^) to be moved from the userinfo set to path set.

TimothyGu avatar May 21 '21 09:05 TimothyGu

It seems to depend on "is special" in Chrome and Firefox, which isn't ideal.

annevk avatar May 21 '21 09:05 annevk

I mean, Chrome and Firefox don't even escape spaces (or anything not in the C0 controls set) in non-special paths…

TimothyGu avatar May 21 '21 09:05 TimothyGu

3986 defines path segments to contain these characters:

pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

That doesn't include ^, so it needs to be percent-encoded.

mnot avatar May 22 '21 01:05 mnot

@achristensen07 Are you okay with aligning Safari on this?

TimothyGu avatar May 22 '21 07:05 TimothyGu

I think so. This is a case where Chrome and Firefox have the same behavior, so aligning with them would likely increase compatibility. It looks like the most compatible solution depends on "is special" Like I said in issue 608, we really need complete tests with each ASCII code point in each part of a URL with and without a special scheme.

achristensen07 avatar May 22 '21 22:05 achristensen07

FYI. The discussion around issue #379 has a good overview of the percent encode sets.

alwinb avatar May 23 '21 06:05 alwinb

FWIW, I've been looking at interoperability between this standard and the URL type in Apple's Foundation framework (which I assume would also be of interest to WebKit). It is documented as conforming to RFC-1738.

The biggest difficulty in getting Foundation to parse the serialised output of this standard is the difference in percent-encode sets. This makes it harder for applications to transition to a web-compatible URL model, as converting to a Foundation URL means adding percent-encoding, so the serialised URL string changes. Anything which minimises that would be appreciated, and if it is actually a better description of how browsers behave, it seems like a no-brainer.

That said, if Safari currently does not encode it, and Chrome/Firefox conditionalise it, and neither of them "broke the web", it seems reasonable to conclude that few if any sites actually care whether it is encoded or not. In that case, the better choice IMO would be to unconditionally encode it and align with RFC-3986 as a bonus. Conditional percent-encode sets are awful.

karwa avatar Nov 12 '21 15:11 karwa