addressable
addressable copied to clipboard
Bug with normalization/unencode and leave_encoded
Normalization breaks superscripts in a URL path.
Consider http://en.wiktionary.org/wiki/³ which is distinctly different from http://en.wiktionary.org/wiki/3 -- normalize will convert the former into the latter.
> require 'addressable/template'
=> true
> Addressable::URI.parse("http://en.wiktionary.org/wiki/³")
=> #<Addressable::URI:0x500b93c URI:http://en.wiktionary.org/wiki/³>
> Addressable::URI.parse("http://en.wiktionary.org/wiki/³").normalize
=> #<Addressable::URI:0x500f014 URI:http://en.wiktionary.org/wiki/3>
> Addressable::URI.unencode("http://en.wiktionary.org/wiki/%C2%B3")
=> "http://en.wiktionary.org/wiki/³"
> Addressable::URI.parse("http://en.wiktionary.org/wiki/%C2%B3").normalize
=> #<Addressable::URI:0x50290c2 URI:http://en.wiktionary.org/wiki/3>
I also tried to normalize the path directly (so that I could pass the leave_encoded
parameter), but that did not work either -- as you can see in the latter examples, the leave_encoded
parameter was respected (the ampersand remains encoded) but the superscript was not (still changes to a regular 3
).
> require 'addressable/template'
=> true
> Addressable::URI.normalize_component("/wiki/³", leave_encoded=/[³]/)
=> "/wiki/3"
> Addressable::URI.normalize_component("/wiki/%C2%B3", leave_encoded=/[³]/)
=> "/wiki/3"
> Addressable::URI.normalize_component("/wiki/³%26³")
=> "/wiki/3&3"
> Addressable::URI.normalize_component("/wiki/³%26³", leave_encoded=/[&³]/)
=> "/wiki/3%263"
> Addressable::URI.normalize_component("/wiki/%C2%B3%26%C2%B3", leave_encoded=/[&³]/)
=> "/wiki/3%263"
This may be related to issue #100, or at least is likely related to the same section of code.
The bug here is with leave_encoded
. See http://intertwingly.net/blog/2004/07/31/URI-Equivalence and referenced discussion for why this behavior is correct in the absence of leave_encoded
.
Ran into this issue and seems like it's still around.
Actual:
Addressable::URI.unencode_component("%E2%84%A2", String, "%E2%84%A2") => "™"
Expected:
Addressable::URI.unencode_component("%E2%84%A2", String, "%E2%84%A2") => "%E2%84%A2"
@sporkmonger I know this issue is super old, but do you know if there was any attempt to fix it?
@AnthonyClark I don't think there's been any attempt to address this (links to the blame views: unencode_component, normalize_component)