rubyinstaller2 icon indicating copy to clipboard operation
rubyinstaller2 copied to clipboard

Windows Ruby 3.1 net/http IPv6 HTTP(S) requests are much slower than Ruby 3.0

Open johnnyshields opened this issue 3 years ago • 26 comments

What problems are you experiencing?

  • net/http requests using IPv6 are significantly slower on Ruby 3.1 than on previous versions.
  • The requests ultimately do succeed, if one waits about a minute or so.
  • Issue affects IPv6 on both HTTP and HTTPS.
  • URLs which only support IPv4 (no IPv6) are fast. See here for example IPv4-only URLs: https://whynoipv6.com/

Steps to reproduce

require 'net/http'

def do_request(url)
  uri = URI(url)
  Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
    request = Net::HTTP::Get.new uri
    response = http.request request # <-- very slow for IPv6
  end
end

do_request('https://www.google.com') # <-- very slow for IPv6
do_request('https://global.jd.com') # <-- fast for IPv4

What's the output from ridk version?

---
ruby:
  path: C:/ruby31
  version: 3.1.1
  platform: x64-mingw-ucrt
  cc: gcc.exe (Rev9, Built by MSYS2 project) 11.2.0
ruby_installer:
  package_version: 3.1.1-1
  git_commit: d9d39f1
msys2:
  path: C:\ruby31\msys64
cc: gcc (Rev10, Built by MSYS2 project) 11.2.0
sh: GNU bash, version 5.1.16(1)-release (x86_64-pc-msys)
os: Microsoft Windows [Version 10.0.22000.556]

Original Ticket: Rubgems is slow on Ruby 3.1

Installed RubyInstaller 3.1.1-1. Running gem install and bundle install seems noticeably slower than on Ruby 3.0. Even gem update --system is very slow; takes several minutes before any log lines print at all.

Doing gem update --system (update to rubygems-3.3.9) doesn't seem to make any difference.

I ran some basic code benchmarks on irb and there wasn't a major performance difference between the two versions.

johnnyshields avatar Mar 13 '22 15:03 johnnyshields

@larskanis any thoughts on this?

johnnyshields avatar May 14 '22 07:05 johnnyshields

OK, I've determined that its due to HTTPS being slow. This script reproduces the issue

require 'net/http'

uri = URI('https://www.google.com')

Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
  request = Net::HTTP::Get.new uri
  response = http.request request # very slow
end

Edited ticket description to reflect this.

johnnyshields avatar May 14 '22 10:05 johnnyshields

Tried seeing if it's DNS related, doesn't look to be. Here's a script to use google DNS

class Resolv
  def self.use_google_dns
    # dns
    nameservers = ['8.8.8.8', '8.8.4.4']
    dns = Resolv::DNS.new(nameserver: nameservers, search: ['mydns.com'], ndots: 1)
    
    # resolver
    hosts = Resolv::Hosts.new()
    resolver = new([hosts, dns])
    
    # replace constant
    remove_const :DefaultResolver
    const_set :DefaultResolver, resolver
  end
end

# replace
require 'resolv-replace'

# use google's DNS servers
Resolv.use_google_dns

johnnyshields avatar May 14 '22 11:05 johnnyshields

@larskanis this seems IPv6 related. The following URL which only supports IPv4 is very fast:

require 'net/http'

uri = URI('https://global.jd.com')

Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
  request = Net::HTTP::Get.new uri
  response = http.request request # <-- fast!
end

Adding to description.

johnnyshields avatar May 14 '22 11:05 johnnyshields

@johnnyshields I tried, but I'm not seeing the same issue:

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x64-mingw-ucrt]

https://www.google.com
body bytesize 14049
response time  0.145 sec
#<Addrinfo: [2607:f8b0:4009:807::2004]:443 TCP>

https://msp-greg.github.io
body bytesize 22161
response time  0.045 sec
#<Addrinfo: [2606:50c0:8000::153]:443 TCP>

https://global.jd.com
body bytesize 20592
response time  0.190 sec
#<Addrinfo: 157.185.170.144:443 TCP>

Code:

require 'uri'
require 'net/http'
require 'openssl'

module HTTPS_IPv4_IPv6
  class << self

    def run(uri_str)
      uri = URI(uri_str)

      puts uri_str

      t_st = Process.clock_gettime(Process::CLOCK_MONOTONIC)

      Net::HTTP.start(uri.host, uri.port, use_ssl: true, verify_mode: OpenSSL::SSL::VERIFY_PEER) do |http|
        req = Net::HTTP::Get.new uri.request_uri
        resp = http.request req
        case resp
        when Net::HTTPSuccess
          body = resp.body
          t_end = Process.clock_gettime(Process::CLOCK_MONOTONIC)
          puts "body bytesize #{body.bytesize}"
          puts format("response time %6.3f sec", t_end - t_st)
          puts http.instance_variable_get(:@socket).io.to_io.remote_address.inspect, ''
        when Net::HTTPRedirection
          puts "#{resp.class}\n   Redirect: #{resp['location']}"
          run resp['location']
        else
          puts resp.class
        end
      end
    end

  end
end

puts '', RUBY_DESCRIPTION, ''

HTTPS_IPv4_IPv6.run 'https://www.google.com'

HTTPS_IPv4_IPv6.run 'https://msp-greg.github.io'

HTTPS_IPv4_IPv6.run 'https://global.jd.com'

MSP-Greg avatar May 16 '22 18:05 MSP-Greg

Thank you for this! Output I get from your script (* using OpenSSL::SSL::VERIFY_NONE)

ruby 3.1.1p18 (2022-02-18 revision 53f5fc4236) [x64-mingw-ucrt]

irb(main):079:0> HTTPS_IPv4_IPv6.run 'https://www.google.com'
https://www.google.com
body bytesize 15108
response time 60.162 sec
#<Addrinfo: 172.217.26.228:443 TCP>

irb(main):084:0> HTTPS_IPv4_IPv6.run 'https://msp-greg.github.io'
https://msp-greg.github.io
body bytesize 22161
response time 240.197 sec
#<Addrinfo: 185.199.110.153:443 TCP>

irb(main):081:0> HTTPS_IPv4_IPv6.run 'https://global.jd.com'
https://global.jd.com
body bytesize 20592
response time  0.384 sec
#<Addrinfo: 14.0.43.163:443 TCP>

The "60 sec + X milliseconds" in the google.com run (IPv6) is very curious. Verified it's always 60 sec. Must be something non-critical is timing out after 60 sec, need to investigate what is causing it.

Something is def strange because Ruby 3.0, 2.7, 2.6, etc. on my machine are all installed with Ruby Windows Installer and all of them work fine.

I'm in Tokyo, Japan, by the way, so maybe that has something to with it.

johnnyshields avatar May 16 '22 18:05 johnnyshields

It is interesting that you're having issues. I first was using ruby master, and the times were similar.

Do you see any difference using ping -6 www.google.com and ping -4 www.google.com?

EDIT: It looks like your network isn't using IPv6, all the addresses are IPv4? Note that my output has IPv6 addresses for Google and GitHub, but IPv4 for global.jd.com.

MSP-Greg avatar May 16 '22 19:05 MSP-Greg

ping -6 www.google.com is timing out, -4 is working.

My hunch is that Ruby is failing IPv6 entirely and then falling back to IPv4 after 60 secs.

With Ruby 3.0, I get this--looks like it's also using IPv4, but not doing a long timeout for IPv6. So something is causing Ruby 3.1 to wait for up to 60 sec for the IPv6 timeout.

ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x64-mingw32]

irb(main):040:0> HTTPS_IPv4_IPv6.run 'https://www.google.com'
https://www.google.com
body bytesize 15110
response time  2.390 sec
#<Addrinfo: 142.251.42.196:443 TCP>

irb(main):042:0> HTTPS_IPv4_IPv6.run 'https://msp-greg.github.io'
https://msp-greg.github.io
body bytesize 22161
response time  0.207 sec
#<Addrinfo: 185.199.110.153:443 TCP>

irb(main):044:0> HTTPS_IPv4_IPv6.run 'https://global.jd.com'
https://global.jd.com
body bytesize 20592
response time  0.633 sec
#<Addrinfo: 14.0.43.163:443 TCP>

johnnyshields avatar May 16 '22 19:05 johnnyshields

So something is causing Ruby 3.1 to wait for up to 60 sec for the IPv6 timeout.

Can you test on any other platforms? Not sure if it's a Windows issue. Since my network is different, not sure if I can repo what you're seeing...

I've got WSL2/Ubuntu, both 20.04 and 22.04, but it's only running IPv4. I guess I should figure out how to change that...

MSP-Greg avatar May 16 '22 19:05 MSP-Greg

Yes, I'll check investigate more in next few days. Based on the info we have, my hypothesis is:

  • My system or ISP isn't allowing IPv6
  • Windows Ruby 3.1 handles this situation differently than Ruby 3.0 and earlier. Ruby 3.1 waits for X seconds trying IPv6 and failing, while 3.0 has either better detection or a much shorter timeout.

johnnyshields avatar May 16 '22 19:05 johnnyshields

Confirmed that Ruby 3.1 on WSL Linux (same machine) does not have this issue. As per above it is using IPv4 only, as expected since it's the same machine.

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]

3.1.2 :040 > HTTPS_IPv4_IPv6.run 'https://www.google.com'
https://www.google.com
body bytesize 15083
response time  0.141 sec
#<Addrinfo: 142.251.42.196:443 TCP>

3.1.2 :042 > HTTPS_IPv4_IPv6.run 'https://msp-greg.github.io'
https://msp-greg.github.io
body bytesize 22161
response time  0.188 sec
#<Addrinfo: 185.199.110.153:443 TCP>

3.1.2 :044 > HTTPS_IPv4_IPv6.run 'https://global.jd.com'
https://global.jd.com
body bytesize 20592
response time  1.649 sec
#<Addrinfo: 14.0.43.163:443 TCP>

johnnyshields avatar May 16 '22 19:05 johnnyshields

Question - in your network adapter properties, is IPv6 checked/enabled? If so, and if you uncheck it, do the Windows connection times drop?

JFYI, haven't gotten IPv6 working with WSL2, not even sure it's possible..

MSP-Greg avatar May 16 '22 20:05 MSP-Greg

Disabling IPv6 on adapter does not seem to have any effect--I see the 60s latency either way.

johnnyshields avatar May 17 '22 03:05 johnnyshields

Interesting. JFYI, I tried the most recent releases back to 2.5, and all connected to the IPv6 addresses.

MSP-Greg avatar May 17 '22 04:05 MSP-Greg

I'm surprised that setting IPV4_FALLBACK_ENABLED does not fix the problem since the symptoms presented here are exactly what it was supposed to fix, if I recall correctly.

deivid-rodriguez avatar May 17 '22 07:05 deivid-rodriguez

@MSP-Greg here's a bit more pinpointing of the issue. It's happening in Socket.tcp, my hunch is that there's some difference in the native code that Socket uses between the two Ruby versions. AFAIK the socket timeout itself is set to 60 sec in both Ruby 3.0 and Ruby 3.1, the difference is that Ruby 3.0 somehow "short-circuits" that timeout immediately, while Ruby 3.1 waits until the end.

require 'net/http'

TIMEOUT = 5

# This code is where the issue happens
# s = Socket.tcp "google.com", 80, nil, nil, connect_timeout: TIMEOUT

# Demonstrates the issue inside Socket.tcp
puts '', RUBY_DESCRIPTION, ''
t = nil
Addrinfo.foreach("google.com", 80, nil, :STREAM) do |ai|
  puts ai.inspect
  t = Time.now
  ai.connect(timeout: TIMEOUT)
rescue SystemCallError
  puts "Failed after #{Time.now - t} sec", ''
end
puts "Success after #{Time.now - t} sec"
ruby 3.1.1p18 (2022-02-18 revision 53f5fc4236) [x64-mingw-ucrt]

#<Addrinfo: [2404:6800:4004:822::200e]:80 TCP (google.com)>
Failed after 5.0140926 sec

#<Addrinfo: 172.217.175.110:80 TCP (google.com)>
Success after 0.0454041 sec

johnnyshields avatar May 17 '22 13:05 johnnyshields

OK, think I have a clue.

Under the hood, Ruby 3.0 Net::HTTP uses:

      # net/http.rb line 985
      s = Timeout.timeout(@open_timeout, Net::OpenTimeout) {
        begin
          TCPSocket.open(conn_addr, conn_port, @local_host, @local_port)

While Ruby 3.1 Net::HTTP uses:

        # net/http.rb line 998
        s = Socket.tcp conn_addr, conn_port, @local_host, @local_port, connect_timeout: @open_timeout

For me, TCPSocket.open is fast on both versions, and Socket.tcp is slow on both versions.

Socket.tcp "google.com", 80, nil, nil, connect_timeout: 60

TCPSocket.open("google.com", 80, nil, nil)

johnnyshields avatar May 17 '22 14:05 johnnyshields

And here is the change that did it: https://github.com/ruby/net-http/pull/10

johnnyshields avatar May 17 '22 14:05 johnnyshields

Issue filed with Net::HTTP: https://github.com/ruby/net-http/issues/57

johnnyshields avatar May 17 '22 14:05 johnnyshields

Well, at least I can monkey-patch in the old Net::HTTP#connect method for now!

johnnyshields avatar May 17 '22 14:05 johnnyshields

This is interesting for RubyGems, because we're monkey-patching TCPSocket#initialize to fix similar issues and I think net-http using Socket.tcp directly now means the monkey patch no longer has the desired effect... 🤔

deivid-rodriguez avatar May 17 '22 14:05 deivid-rodriguez

IDK. With a network (including ISP) fully supporting IPv6, there aren't issues. But, maybe the timeouts in http are a bit long for 'modern day' networks?

@open_timeout = 60
@read_timeout = 60
@write_timeout = 60

MSP-Greg avatar May 17 '22 14:05 MSP-Greg

If I recall correctly, the problem was that Ruby networking APIs do not implement happy eyeballs. Our monkey patch in RubyGems is intended to workaround that. There was some work in progress in ruby-core to implement this but it was never completed: https://github.com/ruby/ruby/pull/4038.

deivid-rodriguez avatar May 17 '22 14:05 deivid-rodriguez

@sonalkr132 I think our monkey-patch to TCPSocket#initialize is no longer fixing this issue due to underlying changes in net/http, so this became a problem again that can't be easily workaround 😞. I guess we should try to adapt the monkey patch to Socket.tcp but I think it might be too low level for us to be able to patch it.

deivid-rodriguez avatar Jun 29 '22 10:06 deivid-rodriguez

Same issue on Linux machine, ruby 3.1. curl works well, wget times out, wget --inet4-only works well. Force ipv4 should be an option, but seems it is not. The most simple solution I could find was changing:

URI.open(address)

to:

URI.open(address, open_timeout: 2)

mirelon avatar Nov 07 '22 20:11 mirelon

For what it's worth, the net/http change was reverted at https://github.com/ruby/net-http/pull/74, so I believe gem install net-http should also fix this?

deivid-rodriguez avatar Nov 07 '22 20:11 deivid-rodriguez