libidn2 support for IDNA2008+UTS#46 (using ffi)
Following #247, here is the PR adding libidn2 support through a small ffi wrapper (no dependencies other than ffi).
:heavy_check_mark: Benchmark result (benchmark code in the PR), slightly slower than libidn1 as it's doing more work but nothing to worry about, we're still above 100k iterations/sec which is good :rocket: :
# > ruby benchmark/idna.rb
# Rehearsal -------------------------------------------
# pure 5.914630 0.000000 5.914630 ( 5.915326)
# libidn 0.518971 0.003672 0.522643 ( 0.522676)
# libidn2 0.763936 0.000000 0.763936 ( 0.763983)
# ---------------------------------- total: 7.201209sec
# user system total real
# pure 6.042877 0.000000 6.042877 ( 6.043252)
# libidn 0.521668 0.000000 0.521668 ( 0.521704)
# libidn2 0.764782 0.000000 0.764782 ( 0.764863)
:heavy_check_mark: I also added a memory leak test to this benchmark code, to make sure I wasn't forgetting any manual memory free from C allocations. It's all good now and we can easily see by commenting one of the idn2_free lines the memory increasing so looking good on this side:
# Memory leak test for libidn2 (memory should stabilize quickly):
# Memory: 117MB
# Memory: 121MB
# Memory: 121MB
# Memory: 121MB
# Memory: 121MB
# Memory: 121MB
# Memory: 121MB
# Memory: 121MB
# Memory: 121MB
# Memory: 121MB
:heavy_check_mark: Specs are green locally (I just added 3 more to specify if the implementation is following IDNA2003 or IDNA2008+UTS#46) and on CI. I verified and there's no additional steps required to install libidn2 (already present by default in ubuntu and macos). I confirmed all the envs are running and passing libidn2 test (except Windows of course), we can see this with the increased number of test ran compared with master and the absence of the "Could not load native libidn2 implementation" line.
I saw maybe the profile job will have to be updated as it's currently running idna_mode: [native, pure] only. I'll have a look at this.
→ Edit: done, I've added the "native2" variant to the profile job and it runs fine. I confirmed locally that it's using libidn2 (and we can see ffi in the output). MemoryProfiler won't catch C-level memory leaks though (I tried).
:heavy_check_mark: As usual I also verified this new version against my 150k URL at updown.io, no differences after normalization between master (using pure implementation) and this branch (using native2). Except that it took only 5.6 seconds now (versus 12s with pure) to normalize the 150k URLs.
Hey! I really love interchangeable IDNA backends introduced in this PR. Would it be possible to merge that part? I've tested this PR with my pure-Ruby implementation of IDNA2008/UTS46, and it integrates seamlessly:
# frozen_string_literal: true
require "uri/idna"
module Addressable
module IDNA
module UTS46
def self.to_ascii(value)
URI::IDNA.to_ascii(value)
end
def self.to_unicode(value)
URI::IDNA.to_unicode(value)
end
end
end
end
The IDNA backend concept would allow me to easily package the Addressable backend within the gem.
Overall the uri-idna gem is 2-4 times slower than Addressable::IDNA::Pure, but I would prefer using that instead of the ffi version tbh :sweat_smile:
Cool. Maybe possible. I really want to find some time to properly play with this myself and merge it. It is really awesome work by @jarthod here. And I'm also following your repo @skryukov :) And https://github.com/ruby/uri/issues/76 hehe.
That would be great indeed, and excellent work on the pure Ruby implementation @skryukov :clap: (I'm also following https://github.com/ruby/uri/issues/76 ^^). @dentarg let me know if there is anything I can do to make it easier for you :bow: