addressable icon indicating copy to clipboard operation
addressable copied to clipboard

URI#display_uri raises ArgumentError: invalid byte sequence in UTF-8

Open roback opened this issue 9 years ago • 1 comments

Addressable::URI#display_uri raises ArgumentError when called on the url http://example.com%C2. The same happens for http://%D5.example.com.

I get the same error both with and without IDNA:

> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `gsub'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `unencode'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:530:in `normalize_component'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1079:in `normalized_host'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
    from (irb):1
> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `split'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `to_ascii'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1072:in `normalized_host'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
    from (irb):1

The cause seems to be calling Addressable::URI.unencode for the above urls which results in a string that Ruby doesn't seem to like:

url = Addressable::URI.unencode("http://%D5.example.com")
# => "http://\xD5.example.com"
url.split(".")
# ArgumentError: invalid byte sequence in UTF-8
#     from (irb):10:in `split'

roback avatar Jan 29 '16 09:01 roback

These are some gross URIs. 😝

That said, I'm not sure I think this is a bug. Given what display_uri is supposed to do, this is legitimately an exceptional condition. There is no way to correctly render a UTF-8 string for that hostname. However, http://example.com%C2, gross as it is, I think it's actually a valid URI, so raising an invalid URI exception doesn't seem correct either. That makes me think this behavior may actually be correct, if perhaps a little surprising.

reg-name = *( unreserved / pct-encoded / sub-delims )

sporkmonger avatar Aug 07 '18 09:08 sporkmonger

This doesn't reproduce anymore, closing

irb(main):004:0> Addressable::VERSION::STRING
=> "2.8.1"
irb(main):005:0> Addressable::URI.parse("http://example.com%C2").display_uri
=> #<Addressable::URI:0x86c4 URI:http://example.com%C2/>
irb(main):006:0> Addressable::URI.unencode("http://%D5.example.com")
=> "http://\xD5.example.com"

Probably due to the changes made in https://github.com/sporkmonger/addressable/pull/459

dentarg avatar Oct 23 '22 10:10 dentarg