Replacing Timeout.timeout in Net::HTTP#connect with socket timeout options
Timeout.timeout is inefficient because it spins up a new thread for each invocation, and can be unsafe (see https://www.mikeperham.com/2015/05/08/timeout-rubys-most-dangerous-api/ and http://blog.headius.com/2008/02/ruby-threadraise-threadkill-timeoutrb.html)
Since we can set open timeouts on sockets now, it would be great to start doing that instead. I sent in a patch years ago, but it was rejected with the reasoning that without Timeout.timeout, there would be no way to time out the DNS lookup (https://bugs.ruby-lang.org/issues/12435).
However, it turns out that Timeout.timeout can't time out the getaddrinfo C call in the first place! So the status quo is that the open_timeout option has no effect on the DNS lookup anyway! Either way we just have to wait for the operating system's getaddrinfo call to time out on its own (30 seconds on MacOS 11.1), and setting open_timeout has no affect on that at all.
That being the case, it would be great if we could drop the use of Timeout.timeout in favor of socket timeout options. I have a simple patch that is passing all tests, and would be happy to send in a pull request, but wanted to discuss a couple things first:
-
To exactly mimic current behavior, we'd have to track the time used up in the DNS lookup, and subtract that from the timeout we give to open the socket. Is this worth it? Is it better done in the implementation of
Socket.tcp? -
Should we take advantage of the
:resolv_timeoutavailable on systems withgetaddrinfo_a(just Linux I believe?) for Ruby >= 2.7?
This was closed by #10, correct?
yes it was. I've actually got equivalent pull requests at https://github.com/ruby/net-smtp/pull/21, https://github.com/ruby/net-pop/pull/3, and https://github.com/ruby/net-ftp/pull/5, @deivid-rodriguez any chance I could get you to give those a quick code review? That seemed to be the impetus behind getting https://github.com/ruby/net-http/pull/10 accepted...
https://github.com/ruby/net-http/pull/10 Was reverted in https://github.com/ruby/net-http/pull/74, it re-opening this issue as it would still be nice to get rid of use of the problematic Timeout.timeout for the reasons stated above
Based on @hsbt's comment here, it sounds like we were waiting for Addrinfo to support a getaddrinfo_a method. But Socket.tcp does already have a resolv_timeout option. Is it not sufficient for this use? It's the timeout being passed to Addrinfo.getaddrinfo.
In the context of this use case, is there a reason continuing to use Addrinfo.getaddrinfo with a timeout isn't sufficient?
As https://github.com/ruby/net-http/pull/74 says, resolv_timeout has no effect and so is not usable.
Just to reiterate though, Timeout.timeout has no effect on whatsoever on DNS lookups, so I'm not sure what we gained by going back to the previous implementation. Either way we can't place a timeout on DNS lookups and have to default to the operating system's timeout. See https://bugs.ruby-lang.org/issues/17528. This is the case whether we use Socket.tcp, and its still the case now that we've reverted the change:
#set the nameserver to an address that doesnt exist, then run the following command:
bin $ time ./ruby -e 'require "net/http"; c = Net::HTTP.new("www.ffefecfceaefefxx.com"); c.open_timeout = 2; response = c.request_get("/"); puts response.code'
/Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `initialize': Failed to open TCP connection to www.ffefecfceaefefxx.com:80 (execution expired) (Net::OpenTimeout)
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `open'
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `block in connect'
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/timeout.rb:189:in `block in timeout'
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/timeout.rb:196:in `timeout'
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/net/http.rb:1269:in `connect'
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/net/http.rb:1248:in `do_start'
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/net/http.rb:1237:in `start'
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/net/http.rb:1817:in `request'
from /Users/mohamed/.rvm/rubies/ruby-3.2.2/lib/ruby/3.2.0/net/http.rb:1727:in `request_get'
from -e:1:in `<main>'
real 0m30.106s
user 0m10.066s
sys 0m41.866s
If you use resolv-replace (the only way to interrupt DNS lookup currently AFAIK) Timeout.timeout will work. So that's a clear advantage.
Also see https://bugs.ruby-lang.org/issues/19430, that would solve it but it's not implemented yet.