concurrent-ruby icon indicating copy to clipboard operation
concurrent-ruby copied to clipboard

Deadlock on wait_for_termination

Open EarlofLemongrab opened this issue 7 years ago • 2 comments

Hey, I have a deadlock problem, code snippet like this:

thread_pool = Concurrent::FixedThreadPool.new(3)
res = Concurrent::ThreadLocalVar.new({})
while @queue.size > 0 do
  thread_pool.post do
    cur = @queue.pop
    build_res = start(cur)
    do_sth(res)
  end
end
thread_pool.shutdown
thread_pool.wait_for_termination

Then here is what I got:

/RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:43:in `sleep': No live threads left. Deadlock? (fatal)
	from /RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:43:in `wait'
	from RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:43:in `ns_wait'
	from RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/abstract_lockable_object.rb:43:in `ns_wait_until'
	from RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/atomic/event.rb:87:in `block in wait'
	from RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `block in synchronize'
	from RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
	from RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
	from RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/atomic/event.rb:84:in `wait'
	from RHEL5_64/DEV.STD.PTHREAD/build/private/env/ruby2.3.x/ruby2.3.x/lib/ruby/gems/2.3.0/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_executor_service.rb:49:in `wait_for_termination'
	from package_builder.rb:64:in `bulk_build'
	from lib/test.rb:152:in `<main>'
rake aborted!

Can someone help me out?

EarlofLemongrab avatar Apr 05 '18 01:04 EarlofLemongrab

Hi guys, here the update, find the problem of deadlock. it is stupid.... its the datarace for the queue... An updated one would be like this:

begin
  thread_pool = Concurrent::FixedThreadPool.new(3)
  build_res = Concurrent::ThreadLocalVar.new({})
  while @build_queue.size > 0 do
    thread_pool.post do
      if @build_queue.size > 0
        cur_pkg = @build_queue.pop
        build_res = WeakRef.new(start_build(cur_pkg))
        cur_pkg.build_req = build_res[:req_id]
        cur_pkg.status = build_res[:status]
        cur_pkg.need_build = false
      end
    end
  end
  thread_pool.shutdown
  thread_pool.wait_for_termination
rescue Exception => e
  return
end

However, I am encountering a serious memory usage problem. The weird thing is, that if the task's queue(@build_queue) size is more than my threads number, say 4. My code's consuming memory will be huge(it will keep growing until it reach the limit during run process) But, if my queue size is less or equal to my threads number(<=3) then the memory usage is very stable----it won't grow rapidly. What is the right way to consume the task queue?

EarlofLemongrab avatar Apr 05 '18 18:04 EarlofLemongrab

The usual way is to do

while job = @queue.pop
  work_on job
end

Currently there is still a data-race between size and pop. Could you try that first, before we investigate the memory issue.

pitr-ch avatar May 02 '18 10:05 pitr-ch