url
url copied to clipboard
Parsing square brackets ([]) in path, query, and fragment
It seems that URL parsers in the wild allow square brackets ([]) in path, query, and fragment. On the other hand, it seems that the URL spec says square brackets in path, query, and fragment will cause validation error.
My question is which one is correct:
- url parsers are correct, the spec should be tweaked
- the spec is correct, urls parses should be tweaked
- both are correct (I'm wrong)
My opinion is url parsers are correct though I'm not too sure. Please let me know if I missed something.
URL parsers in the wild allow square brackets in path, query, and fragment:
new URL('https://example.com/[]?[]#[]'); // doesn't throw
// URL {
// href: 'https://example.com/[]?[]#[]',
// origin: 'https://example.com',
// protocol: 'https:',
// username: '',
// password: '',
// host: 'example.com',
// hostname: 'example.com',
// port: '',
// pathname: '/[]',
// search: '?[]',
// searchParams: URLSearchParams { '[]' => '' },
// hash: '#[]'
// }
I tested with Node.js 16 (stable), Firefox 90 (nightly) and Chrome 90 (stable).
The URL spec says square brackets in path, query, and fragment will cause validation error.
In basic URL parser's path state step 2., query state step 3. and fragment state step 1.:
- If c is not a URL code point and not U+0025 (%), validation error.
- If c is U+0025 (%) and remaining does not start with two ASCII hex digits, validation error.
- UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.
and URL code point doesn't contain square brackets (U+005B ([) and U+005D (]).
The URL code points are ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters.
The output of the URL parser is not necessarily a URL record, that when serialized, is a valid URL string.
Perhaps we should add this example to https://url.spec.whatwg.org/#urls and state it there.
See also #379, which I guess this is a duplicate of.
@annevk I appreciate your clarification!