oneTBB
oneTBB copied to clipboard
Stall with high number of threads and CPU load on task_arena::enqueue
Hi,
From time to time, we experience stalls on task_arena::enqueue
on a thread intensive application. I spent some time diagnosing the issue.
It seems to come from how task_stream::push
and task_stream::pop
work. Essentially, they cycle through all lanes trying to find one which is not locked and not empty.
During my stalls, I observe that:
- The main thread is not running
- Some of the worker threads are spinning through the lane list trying to pop a task without success
Since the non empty state of the queue is accessible through the atomic population
variable before the lock is released, when all queues are empty and a thread is pushing a new item, when the atomic is updated but the lock not yet released, the pop behaves like a spin_lock. My guess is that it results in the typical spin_lock issue where the thread holding the lock is not running because the system scheduler is running computation intensive spinner threads instead.
This scenario sounds unlikely but over time we are enqueuing many times and it's often when the lanes are empty. Also, it matches perfectly my debugging and VTune profiling of the issue.
Good catch! Thank you for the investigation.
Can you try the workaround? Try to add backoff strategy to the pop
method (task_stream.h:196-199):
- do {
- lane = next_lane( /*out_of=*/N );
- __TBB_ASSERT( lane < N, "Incorrect lane index." );
- } while( !empty() && !(popped = try_pop( lane )) );
+ for (atomic_backoff b; !empty() && !popped; b.pause()) {
+ lane = next_lane( /*out_of=*/N);
+ __TBB_ASSERT(lane < N, "Incorrect lane index.");
+ popped = try_pop(lane);
+ }
I'll try that now. I tried to add a simple counter to break out of the loop after a certain number of iteration and it fixes the issue. I'll be back quickly with the backoff change.
Yep, adding the backoff fixes the issue too
@alexey-katranov, does it make sense to make PR with these changes ?
It seems yes.