clipper
clipper copied to clipboard
Potential deadlock when deleting model container replica queue
A deadlock can occur when a model container replica is removed, crippling all communication between the frontend and all model containers. Any requests that are not in the cache will time out.
This happens as follows (with TaskExecutionThreadPool
in src/libclipper/include/clipper/threadpool.hpp):
-interrupt_thread
is called, sending an interrupt to the queue's worker thread
-delete_queue
is called: it acquires a unique lock on queues_mutex_
, invalidates the queue in question, then waits for the worker thread to finish
-the worker thread blocks to acquire a shared lock on queues_mutex_
The worker thread never acquires the shared lock on the mutex because the delete_queue
function holds a unique lock on it and delete_queue
never relinquishes because it is waiting for the thread to finish. Following this, no tasks can be sent to any queue because the submit
function blocks to acquire a shared lock on queues_mutex
but never can, again because delete_queue
has a unique lock on it that it never relinquishes.