Add test cases for IDNA2003 vs IDNA2008
Per http://www.unicode.org/reports/tr46/ – make sure Addressable supports IDNA2008 and add tests on major edge cases between IDNA2003 and IDNA2008.
Looks like Addressable, when used with libidn (via idn-ruby gem), follows IDNA 2003 (as stated by libidn website) but when libidn (idn-ruby gem) isn't available, and we fall back to the "pure" version, IDNA 2008 looks to be used.
# native (libidn)
irb(main):001:0> Addressable::URI.parse("http://faß.de").normalize
=> #<Addressable::URI:0xf78 URI:http://fass.de/>
# pure
irb(main):001:0> Addressable::URI.parse("http://faß.de").normalize
=> #<Addressable::URI:0x13d8 URI:http://xn--fa-hia.de/>
Full repro:
arm64 $ docker run -it --rm ruby:3.1.3 bash
root@2a71c4abec4a:/# gem install addressable
Fetching public_suffix-5.0.1.gem
Fetching addressable-2.8.1.gem
Successfully installed public_suffix-5.0.1
Successfully installed addressable-2.8.1
2 gems installed
root@2a71c4abec4a:/# irb -raddressable/uri
irb(main):001:0> Addressable::URI.parse("http://faß.de").normalize
=> #<Addressable::URI:0x13d8 URI:http://xn--fa-hia.de/>
irb(main):002:0>
root@2a71c4abec4a:/# apt-get update && apt-get install -y libidn11-dev
Get:1 http://deb.debian.org/debian bullseye InRelease [116 kB]
Get:2 http://deb.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:3 http://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:4 http://deb.debian.org/debian bullseye/main arm64 Packages [8072 kB]
Get:5 http://deb.debian.org/debian-security bullseye-security/main arm64 Packages [218 kB]
Get:6 http://deb.debian.org/debian bullseye-updates/main arm64 Packages [12.0 kB]
Fetched 8510 kB in 1s (7654 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libidn11
The following NEW packages will be installed:
libidn11 libidn11-dev
0 upgraded, 2 newly installed, 0 to remove and 10 not upgraded.
Need to get 708 kB of archives.
After this operation, 1212 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bullseye/main arm64 libidn11 arm64 1.33-3 [115 kB]
Get:2 http://deb.debian.org/debian bullseye/main arm64 libidn11-dev arm64 1.33-3 [593 kB]
Fetched 708 kB in 0s (4362 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libidn11:arm64.
(Reading database ... 22781 files and directories currently installed.)
Preparing to unpack .../libidn11_1.33-3_arm64.deb ...
Unpacking libidn11:arm64 (1.33-3) ...
Selecting previously unselected package libidn11-dev:arm64.
Preparing to unpack .../libidn11-dev_1.33-3_arm64.deb ...
Unpacking libidn11-dev:arm64 (1.33-3) ...
Setting up libidn11:arm64 (1.33-3) ...
Setting up libidn11-dev:arm64 (1.33-3) ...
Processing triggers for libc-bin (2.31-13+deb11u5) ...
root@2a71c4abec4a:/# gem install idn-ruby
Fetching idn-ruby-0.1.5.gem
Building native extensions. This could take a while...
Successfully installed idn-ruby-0.1.5
1 gem installed
root@2a71c4abec4a:/# irb -raddressable/uri
irb(main):001:0> Addressable::URI.parse("http://faß.de").normalize
=> #<Addressable::URI:0xf78 URI:http://fass.de/>
So for Addressable to fully support IDNA 2008 it would have to use Libidn2 somehow? 🤔
As discussed in https://github.com/sporkmonger/addressable/issues/408#issuecomment-1421066788 I believe that yes we should add support for libidn2.
I looked for options and found:
- https://github.com/hfm/idn2-ruby: simply going from
idntoidn2, but it's actually an anbandonned empty shell. - https://github.com/ogom/ruby-idna: a maintained wrapper dynamically linked using
ffi(no static C extension to compile), looks like a good option to me - https://github.com/HoneyryderChuck/idnx: provides an
ffiwrapper tolibidn2, a native Windows API version, and a pure Ruby version (IDNA2003). Might be interesting if we want to offload this part entirely but in this case I'll have to get the pure ruby implementation up to standard (#491). It's Apache licenced, not sure if that could be an issue. - Alternatively as the FFI wrappers are not too complex, we could probably ditch the dependency, inline the wrapper code and just depend on
ffi.
Even though it's not very DRY I would lean toward option 4 in order to reduce exposure to dependency issues (project abandonned, hacked, gem version incompatibilities, etc...)
@sporkmonger @dentarg what do you think?
I agree with you about 4 and to reduce exposure to dependency issues.
At a glance https://github.com/ogom/ruby-idna looked well-maintained (it was recently updated) but then I noticed it was almost 6 years between versions.
And thank you for doing this research. ❤️
PR is here: #496