net-http icon indicating copy to clipboard operation
net-http copied to clipboard

Starting a connection hangs indefinitely sometimes

Open ngan opened this issue 2 years ago • 2 comments

We've been seeing some of our background jobs hang on jobs that use net-http. Here's 2 backtraces:

This is from using S3's put_object call:

/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `initialize'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `open'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `block in connect'
/usr/local/bundle/ruby/3.2.0/gems/timeout-0.4.0/lib/timeout.rb:186:in `block in timeout'
/usr/local/bundle/ruby/3.2.0/gems/timeout-0.4.0/lib/timeout.rb:193:in `timeout'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1269:in `connect'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1248:in `do_start'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1243:in `start'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/delegate.rb:87:in `method_missing'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/connection_pool.rb:307:in `start_session'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/connection_pool.rb:100:in `session_for'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/handler.rb:128:in `session'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/handler.rb:76:in `transmit'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/handler.rb:50:in `call'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/plugins/content_length.rb:24:in `call'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/plugins/request_callback.rb:118:in `call'
...  [ aws code ] ...
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-s3-1.133.0/lib/aws-sdk-s3/bucket.rb:682:in `put_object'
... [ our code ] ...

(full backtrace here)

This is from using Faraday:

/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `initialize'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `open'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `block in connect'
/usr/local/bundle/ruby/3.2.0/gems/timeout-0.4.0/lib/timeout.rb:186:in `block in timeout'
/usr/local/bundle/ruby/3.2.0/gems/timeout-0.4.0/lib/timeout.rb:193:in `timeout'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1269:in `connect'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1248:in `do_start'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1237:in `start'
/usr/local/bundle/ruby/3.2.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:146:in `request_via_request_method'
/usr/local/bundle/ruby/3.2.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:131:in `request_with_wrapped_block'
/usr/local/bundle/ruby/3.2.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:122:in `perform_request'
/usr/local/bundle/ruby/3.2.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:66:in `block in call'
... [ truncated ] ...

It looks to be hanging in Ruby's net/http's TCPSocket.open call.

Some things to note, in case it matters:

  • We're noticing this happen in background jobs. We're using Sidekiq.
  • We use the latest version of Ruby's timeout gem (0.4.0), which has this notable change: https://github.com/ruby/timeout/pull/30

ngan avatar Aug 25 '23 01:08 ngan

Downgrading to timeout 0.3.2 did not help.

ngan avatar Aug 28 '23 18:08 ngan

Went looking for anything around timeouts, and found https://github.com/ruby/net-http/issues/6 ... not sure if that is relevant though.

technicalpickles avatar Sep 12 '23 13:09 technicalpickles