concurrent-ruby icon indicating copy to clipboard operation
concurrent-ruby copied to clipboard

CachedThreadPool does not spin down idle threads

Open SamSaffron opened this issue 11 months ago • 13 comments

In the Ruby implementation:

require 'concurrent'

puts "Thread Count: #{Thread.list.count}"

pool = Concurrent::CachedThreadPool.new( min_threads: 0, max_threads: 5, idletime: 1 )

puts "Thread Count: #{Thread.list.count}"
# we have 1 thread for now

5.times do
  pool.post do
    puts "Hello from thread #{Thread.current.object_id}"
  end
end
# we now have 5

sleep 2
puts "Thread Count: #{Thread.list.count}"
# we still have 5

pool.post do
  puts "Hello from thread #{Thread.current.object_id}"
end

sleep 1

puts "Thread Count: #{Thread.list.count}"
# we just issued a prune as a side effect so we have 2

pool.prune_pool
sleep 2
# we need to sleep cause prune_pool is async

puts "Thread Count: #{Thread.list.count}"
# we now have 1 thread

CachedThreadPool only spins down threads when you queue work, so if you are done queuing work you need to remember to wait for it to be done and then issue a #prune_pool call

This is technically very tricky cause you need to carry an extra thread for that. The Ruby design should change so threads just wait idle_time for new work and spin down if no new work arrives.

SamSaffron avatar Dec 19 '24 03:12 SamSaffron

What's the cost/downside you're seeing with a sleeping thread on Queue#pop?

To prune precisely I think we have a couple of options:

  • another thread to sleep and prune. Downside as you note is another thread.
  • replace the Queue object with a queue-with-wait implementation (I don't think Concurrent Ruby has one, though TimerSet is close it) so the thread could prune itself.

bensheldon avatar Dec 19 '24 05:12 bensheldon

@bensheldon mainly complexity, it is complexity we do not want to carry.

I wrote a queue with a conditional var here:

https://github.com/discourse/discourse/pull/30364

I am more than happy for this to be lifted into concurrent ruby, it has a few tiny discoursisms (multisite support and logging patterns) but is safe.

For us this is perfect, I have not benched it against the current implementation but for our usage we always tend to make DB calls so we are not optimising for microsecond execution times shortest would be usually 10s of milliseconds.

SamSaffron avatar Dec 19 '24 09:12 SamSaffron

I wrote a queue with a conditional var here:

FWIW I also recall writing one in https://github.com/ruby/timeout/blob/607d8c6fbe4d86db1cf22846e89198b47cec7161/lib/timeout.rb#L105-L109 And since Ruby 3.2 there is Queue#pop(timeout:).

eregon avatar Dec 19 '24 16:12 eregon

@eregon thanks for the tip, I ended up following up with a hybrid approach

https://github.com/discourse/discourse/pull/30392/files

There is so much detail in such a small amount of code. being able to efficiently scale up the queue is very tough, coordinating work between all the critical sections is tough.

I am not even sure if this is my last PR refining this ...

Would love it if you can have a look and let me know if this is something we should port to concurrent, having a building block like this is super important, I want it to work for everyone

SamSaffron avatar Dec 19 '24 21:12 SamSaffron

I think having the improvement to automatically spin down idle threads in CachedThreadPool would make a lot of sense. The only thing is I don't have much time to review, but other maintainers might have more.

It seems the Java version does that: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool--

Threads that have not been used for sixty seconds are terminated and removed from the cache.

So it would be good for consistency too.

eregon avatar Dec 26 '24 12:12 eregon

Hey folks, I was pointed this way after opening a PR to fix a similar issue with ThreadPoolExecutor. I ended up adding another thread, however I'm open to other ideas. This seems worth exploring 🤔:

replace the Queue object with a queue-with-wait implementation (I don't think Concurrent Ruby has one, though TimerSet is close it) so the thread could prune itself.

Context:

  • Rails PR: https://github.com/rails/rails/pull/53211
  • CR issue: https://github.com/ruby-concurrency/concurrent-ruby/issues/1066
  • CR PR: https://github.com/ruby-concurrency/concurrent-ruby/pull/1079

joshuay03 avatar Jan 17 '25 10:01 joshuay03

TIL about Queue#pop(timeout:) after Ruby 3.2. That is soo nice!

...but also presents the challenge that it becomes simple (imo) to implement spinning down idle threads in Ruby 3.2+... and somewhat complex to do so in older versions of Ruby (assuming that a timeoutable-queue-pop is the strategy).

I sorta see two paths, and I'd love feedback about which to pick:

  • Have idle thread spin-down be a Ruby 3.2+ only feature
  • Have a Concurrent::QueueWithPopTimeout that implements the feature on older Ruby and inherits from Queue for Ruby 3.2+

bensheldon avatar Jan 18 '25 17:01 bensheldon

There is a 100% working implementation using ConditionVariable.new right here:

https://github.com/discourse/discourse/blob/main/lib/scheduler/thread_pool.rb

No background thread needed, Queue timeout is awesome, but this achieves the exact same goal in a backwards compatible way.

The challenge here imo is not the implementation, it is more the architecture of Concurrent rb. Given its Java roots there are lots of abstractions at play and conditional Java code baked in. It makes it very complicated to port in a solution like this despite it being pretty much ready.

SamSaffron avatar Jan 19 '25 22:01 SamSaffron

Before @SamSaffron's last message I managed to get a working implementation using Queue#pop(timeout:) which I've pushed here: https://github.com/ruby-concurrency/concurrent-ruby/compare/master...joshuay03:concurrent-ruby:better-ruby-thread-pool-executor-pruning

Seeing desirable results both with my script (failing to scale up) and @SamSaffron's (failing to scale down).

Shouldn't be too much effort to replace that with a ConditionVariable, I can look into that soon.

joshuay03 avatar Jan 26 '25 11:01 joshuay03

Yeah a timeout pop is easy to do with a ConditionVariable, see https://github.com/ruby-concurrency/concurrent-ruby/issues/1075#issuecomment-2555040044. And I think it'd make sense to use Queue#pop(timeout:) on 3.2+ as it should be quite a bit more efficient (ConditionVariable + Mutex is pretty heavy). I wouldn't expose that as a public class but an internal utility.

eregon avatar Jan 27 '25 12:01 eregon

Okay I'm pretty happy with where https://github.com/ruby-concurrency/concurrent-ruby/pull/1082 is currently at. Results are looking good with both automated and manual testing. However, the specs are quite flakey on just CRuby 2.3 to 2.5... 👀 I'd appreciate some help with that!

joshuay03 avatar Feb 15 '25 00:02 joshuay03

However, the specs are quite flakey on just CRuby 2.3 to 2.5... 👀 I'd appreciate some help with that!

Figured it out, a not so obvious race condition... It's ready for review.

joshuay03 avatar Feb 22 '25 01:02 joshuay03

As a note, help to review https://github.com/ruby-concurrency/concurrent-ruby/pull/1082 would be welcome, especially since I won't have time to review it in details soon.

eregon avatar Mar 28 '25 20:03 eregon