terminal icon indicating copy to clipboard operation
terminal copied to clipboard

Escape '#' except when fragment identifier

Open jeremypw opened this issue 4 years ago • 2 comments

Fixes #625 Rather than allowing all '#' characters in URLs this only allows those in the sequence '/#' which might be fragment identifiers.

Open to suggestions for a more elegant method of doing this.

jeremypw avatar Oct 17 '21 12:10 jeremypw

@cassidyjames Hmm, OK thanks - I'll have to think again :thinking:

jeremypw avatar Jan 21 '22 10:01 jeremypw

These rules from w3.org may help:

2.6.4 URL manipulation and creation
To fragment-escape a string input, a user agent must run the following steps:

Let input be the string to be escaped.

Let position point at the first character of input.

Let output be an empty string.

Loop: If position is past the end of input, then jump to the step labeled end.

If the character in input pointed to by position is in the range U+0000 to U+0020 or is one of the following characters:

U+0022 QUOTATION MARK character (")
U+0023 NUMBER SIGN character (#)
U+0025 PERCENT SIGN character (%)
U+003C LESS-THAN SIGN character (<)
U+003E GREATER-THAN SIGN character (>)
U+005B LEFT SQUARE BRACKET character ([)
U+005C REVERSE SOLIDUS character (\)
U+005D RIGHT SQUARE BRACKET character (])
U+005E CIRCUMFLEX ACCENT character (^)
U+007B LEFT CURLY BRACKET character ({)
U+007C VERTICAL LINE character (|)
U+007D RIGHT CURLY BRACKET character (})
...then append the percent-encoded form of the character to output. [RFC3986]

Otherwise, append the character itself to output.

This escapes any ASCII characters that are not valid in the URI <fragment> production without being escaped.

Advance position to the next character in input.

Return to the step labeled loop.

End: Return output.

jeremypw avatar Jan 21 '22 12:01 jeremypw