addressable
addressable copied to clipboard
URI#display_uri raises ArgumentError: invalid byte sequence in UTF-8
Addressable::URI#display_uri
raises ArgumentError
when called on the url http://example.com%C2
. The same happens for http://%D5.example.com
.
I get the same error both with and without IDNA:
> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `gsub'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `unencode'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:530:in `normalize_component'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1079:in `normalized_host'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
from (irb):1
> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `split'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `to_ascii'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1072:in `normalized_host'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
from (irb):1
The cause seems to be calling Addressable::URI.unencode
for the above urls which results in a string that Ruby doesn't seem to like:
url = Addressable::URI.unencode("http://%D5.example.com")
# => "http://\xD5.example.com"
url.split(".")
# ArgumentError: invalid byte sequence in UTF-8
# from (irb):10:in `split'
These are some gross URIs. 😝
That said, I'm not sure I think this is a bug. Given what display_uri
is supposed to do, this is legitimately an exceptional condition. There is no way to correctly render a UTF-8 string for that hostname. However, http://example.com%C2
, gross as it is, I think it's actually a valid URI, so raising an invalid URI exception doesn't seem correct either. That makes me think this behavior may actually be correct, if perhaps a little surprising.
reg-name = *( unreserved / pct-encoded / sub-delims )
This doesn't reproduce anymore, closing
irb(main):004:0> Addressable::VERSION::STRING
=> "2.8.1"
irb(main):005:0> Addressable::URI.parse("http://example.com%C2").display_uri
=> #<Addressable::URI:0x86c4 URI:http://example.com%C2/>
irb(main):006:0> Addressable::URI.unencode("http://%D5.example.com")
=> "http://\xD5.example.com"
Probably due to the changes made in https://github.com/sporkmonger/addressable/pull/459