async icon indicating copy to clipboard operation
async copied to clipboard

Tasks signaling Conditions leave suspended Fibers behind

Open rmosolgo opened this issue 1 year ago • 2 comments

I've learned that left-over, suspended fibers can retain ActiveRecord connections, causing the connection pool to run out of available connections (example: https://github.com/rmosolgo/graphql-ruby/issues/4739#issuecomment-1866671563).

So, I added a test to my project that makes sure that all Fibers created during execution end up not alive?. It currently doesn't pass when I use async as the "backend" for GraphQL::Dataloader. Here's the test: https://github.com/rmosolgo/graphql-ruby/blob/e60d22b138d91c0193e2b24eeb7b20b60648698f/spec/graphql/dataloader_spec.rb#L1020-L1026

In that test, I have to pause GC in order to catch the suspended fiber, because otherwise it will get GC'ed, since the task assigns @fiber = nil.

But, in Rails, that suspended Fiber can be the "owner" of an ActiveRecord connection, and since it's suspended, ActiveRecord thinks that it might still use the connection -- and therefore doesn't reuse the connection 😖 .

Here's a small replication (the condition.signal seems to be the special sauce):

require "bundler/inline"

gemfile do
  gem "async", "~>2.6"
  gem "sqlite3", "~>1.7"
  gem "rails", "~>7.1"
end

require "active_record"

ActiveSupport::IsolatedExecutionState.isolation_level = :fiber
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: "__example.db")
ActiveRecord::Schema.define do
  self.verbose = false
  create_table :things, force: true do |t|
    t.column :name, :string
  end
end

class Thing < ActiveRecord::Base
end

Thing.create!(name: "Pogo Stick")

pp ActiveRecord::Base.connection_pool.stat

def run_process(number)
  puts "#{number} - begin process"
  Sync do |root_task|
    condition = Async::Condition.new
    t1 = root_task.async do |inner_task|
      condition.wait
    end

    root_task.async do |inner_task|
      Thing.all.to_a
      condition.signal
    end.wait

    t1.wait
  end
  puts "#{number} - end process"
end


run_process(1)
run_process(2)
run_process(3)

# For clarity's sake, remove connections which ActiveRecord knows aren't in use:
ActiveRecord::Base.connection_pool.reap

puts "\n\nSome AR connections are in use, even though the process exited:"
pp ActiveRecord::Base.connection_pool.connections.map { |c|
  { in_use: !!c.in_use?, owner_alive?: c.owner&.alive?, owner: c.owner,  }
}
# For exmaple:
# [{:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000107a1f1f0 (resumed)>},
#  {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x000000010b845060 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>},
#  {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x000000010b8e4020 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>},
#  {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x000000010b80ded0 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>}]

run_process(4)
# This process will fail to acquire a connection:
run_process(5)
run_process(6)

It prints like this:

$ ruby async_issue.rb
/Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/io-event-1.3.3/lib/io/event/support.rb:24: warning: IO::Buffer is experimental and both the Ruby and C interface may change in the future!
{:size=>5, :connections=>1, :busy=>1, :dead=>0, :idle=>0, :waiting=>0, :checkout_timeout=>5.0}
1 - begin process
1 - end process
2 - begin process
2 - end process
3 - begin process
3 - end process


Some AR connections are in use, even though the process exited:
[{:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000101b2f2a8 (resumed)>},
 {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000105d11ce8 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>},
 {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000105f18d98 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>},
 {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000105f13b40 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>}]
4 - begin process
4 - end process
5 - begin process

(It never exits and for some reason it also doesn't raise with a timeout error 🤷 )

I think it would be better if this Fiber was terminated, so that ActiveRecord could reap the connection it was using. What do you think?

rmosolgo avatar Dec 27 '23 21:12 rmosolgo

Thanks for the report, I agree we should do something, but exactly what I'm not sure yet.

ioquatix avatar Dec 27 '23 21:12 ioquatix

Thanks for taking a look. I also just saw your answer over at https://bugs.ruby-lang.org/issues/20081 (thanks for that, too 😅 ). That ensure suggestion makes this script work, too:

+ ensure 
    condition.signal 

I'm going to see if I can integrate that into my GraphQL-Ruby code in the meantime 🍻 !

rmosolgo avatar Dec 27 '23 21:12 rmosolgo