net-http
net-http copied to clipboard
Starting a connection hangs indefinitely sometimes
We've been seeing some of our background jobs hang on jobs that use net-http. Here's 2 backtraces:
This is from using S3's put_object call:
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `initialize'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `open'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `block in connect'
/usr/local/bundle/ruby/3.2.0/gems/timeout-0.4.0/lib/timeout.rb:186:in `block in timeout'
/usr/local/bundle/ruby/3.2.0/gems/timeout-0.4.0/lib/timeout.rb:193:in `timeout'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1269:in `connect'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1248:in `do_start'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1243:in `start'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/delegate.rb:87:in `method_missing'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/connection_pool.rb:307:in `start_session'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/connection_pool.rb:100:in `session_for'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/handler.rb:128:in `session'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/handler.rb:76:in `transmit'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/net_http/handler.rb:50:in `call'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/plugins/content_length.rb:24:in `call'
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/plugins/request_callback.rb:118:in `call'
... [ aws code ] ...
/usr/local/bundle/ruby/3.2.0/gems/aws-sdk-s3-1.133.0/lib/aws-sdk-s3/bucket.rb:682:in `put_object'
... [ our code ] ...
This is from using Faraday:
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `initialize'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `open'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1271:in `block in connect'
/usr/local/bundle/ruby/3.2.0/gems/timeout-0.4.0/lib/timeout.rb:186:in `block in timeout'
/usr/local/bundle/ruby/3.2.0/gems/timeout-0.4.0/lib/timeout.rb:193:in `timeout'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1269:in `connect'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1248:in `do_start'
/usr/lib/fullstaq-ruby/versions/3.2.2/lib/ruby/3.2.0/net/http.rb:1237:in `start'
/usr/local/bundle/ruby/3.2.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:146:in `request_via_request_method'
/usr/local/bundle/ruby/3.2.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:131:in `request_with_wrapped_block'
/usr/local/bundle/ruby/3.2.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:122:in `perform_request'
/usr/local/bundle/ruby/3.2.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:66:in `block in call'
... [ truncated ] ...
It looks to be hanging in Ruby's net/http's TCPSocket.open call.
Some things to note, in case it matters:
- We're noticing this happen in background jobs. We're using Sidekiq.
- We use the latest version of Ruby's
timeoutgem (0.4.0), which has this notable change: https://github.com/ruby/timeout/pull/30
Downgrading to timeout 0.3.2 did not help.
Went looking for anything around timeouts, and found https://github.com/ruby/net-http/issues/6 ... not sure if that is relevant though.