Threadpool icon indicating copy to clipboard operation
Threadpool copied to clipboard

Deadlock due to lack of cross thread signalling

Open Squadrick opened this issue 2 years ago • 2 comments

Consider a series of enqueues to a thread pool with two threads: func1(), stalled_func(), func2(), wait a second, func3(). Execution of func3() will unstall stalled_func().

Since enqueue are round-robin, func1() and func2() are enqueued of Thread 1, stalled_func() and func3() on Thread 2.

After func2(), Thread 1 goes to sleep waiting to be signalled sem.acquire_many(), since _in_flight == 0. After a second, when func3() is enqueued, it is pushed to Thread 2's queue and its semaphore is signalled, but it cannot execute func3() since it is in the middle of executing stalled_func(). Thread 1 will continue to wait without stealing Thread 2's pending work.

This will cause the thread-pool to deadlock.


If this behaviour is not supported:

Execution of func3() will unstall stalled_func().

then, that's fair. It's a decent limitation for a thread pool. This behaviour happens when executing a DAG of dynamically connected tasks (e.g.: reading assets from disk). A task node can continue execution only if all its parent nodes have finished execution.

I'm opening an issue for posterity if/when anyone wants to handle this case too.

Squadrick avatar Feb 02 '23 11:02 Squadrick

Many thanks for this, I will have to have a think, this:

This behaviour happens when executing a DAG of dynamically connected tasks

is definitly a use case I would like to support.

ConorWilliams avatar Feb 03 '23 09:02 ConorWilliams

Here's an idea to get started: Tracking idle threads. Before a thread goes to sleep (sem.acquire_many), add it to an idle thread queue. Whenever a task needs to be enqueued, pop a thread from the idle thread queue and push it to that thread.

This works reasonably well for some thread pools I've implemented. A very mature, advanced version of this is what is used by rayon.

Squadrick avatar Feb 05 '23 09:02 Squadrick