Threadpool
Threadpool copied to clipboard
Deadlock due to lack of cross thread signalling
Consider a series of enqueues to a thread pool with two threads: func1(), stalled_func(), func2(), wait a second, func3(). Execution of func3() will unstall stalled_func().
Since enqueue are round-robin, func1() and func2() are enqueued of Thread 1, stalled_func() and func3() on Thread 2.
After func2(), Thread 1 goes to sleep waiting to be signalled sem.acquire_many(), since _in_flight == 0. After a second, when func3() is enqueued, it is pushed to Thread 2's queue and its semaphore is signalled, but it cannot execute func3() since it is in the middle of executing stalled_func(). Thread 1 will continue to wait without stealing Thread 2's pending work.
This will cause the thread-pool to deadlock.
If this behaviour is not supported:
Execution of
func3()will unstallstalled_func().
then, that's fair. It's a decent limitation for a thread pool. This behaviour happens when executing a DAG of dynamically connected tasks (e.g.: reading assets from disk). A task node can continue execution only if all its parent nodes have finished execution.
I'm opening an issue for posterity if/when anyone wants to handle this case too.
Many thanks for this, I will have to have a think, this:
This behaviour happens when executing a DAG of dynamically connected tasks
is definitly a use case I would like to support.
Here's an idea to get started: Tracking idle threads. Before a thread goes to sleep (sem.acquire_many), add it to an idle thread queue. Whenever a task needs to be enqueued, pop a thread from the idle thread queue and push it to that thread.
This works reasonably well for some thread pools I've implemented. A very mature, advanced version of this is what is used by rayon.