thread-pool icon indicating copy to clipboard operation
thread-pool copied to clipboard

Synchronization on shutdown.

Open l90lpa opened this issue 7 years ago • 5 comments

I have noticed that there seems to be a race condition in the shutdown process of ThreadPool. By example let Thread1 be the main thread executing shutdown() and let Thread2 be a ThreadWorker executing operator(). If we take the following interleaving of calls (assuming that the queue is empty by the start):

Thread2: checks the while condition and returns true. Thread1: sets m_shutdown to false. Thread1: calls m_conditional_lock.notify_all(). Thread2: takes ownership of m_pool->m_conditional_mutex. Thread2: call m_pool->m_conditional_lock.wait( lock ).

In the above scenario it can be seen that the notify_all() is completed before Thread2 calls wait which means that the ThreadWorker remains waiting indefinitely (unless a spurious wake-up occurs) and so the call to join() in shutdown() can't be completed stopping the shutdown of the thread pool. One suggestion to fix this would be to provide a predicate to the wait(lock), such as m_pool->m_conditional_lock.wait( lock , [this]{ return (!this->m_pool->m_queue.empty()) || this->m_pool->m_shutdown;});

l90lpa avatar Jun 11 '18 14:06 l90lpa

I don't think that this is sufficient. The wait predicate AFAIK is to guard against spurious wakeup. It's probably recommended, but I don't think that it solves the race condition.

Your call operator should look more like this:

void operator()()
{
  std::unique_lock<std::mutex> lock(m_pool->m_conditional_mutex);
  while (!m_pool->m_shutdown) {
  {
    if (m_pool->m_queue.empty()) {
      m_pool->m_conditional_lock.wait(lock, [this] { return this->m_pool->m_shutdown || this->m_pool->m_queue.empty() == false; });
    }
    std::function<void()> func;
    if (m_pool->m_queue.dequeue(func)) {
      func();
    }
  }
}

The problem described is then fixed by locking the mutex in your shutdown() function. Note that wait() releases the lock when called and acquires the lock when woken. You need to acquire the lock in order to prevent wait() being called after the call to notify_all() in shutdown().

void shutdown()
{
  {
    std::lock_guard<std::mutex> lock(m_conditional_mutex);
    m_shutdown = true;
    m_conditional_lock.notify_all();
  }
  
  for (int i = 0; i < m_threads.size(); ++i)
  {
    if (m_threads[i].joinable())
    {
      m_threads[i].join();
    }
  }
}

Similarly, in submit(), you need to acquire the lock before calling notify_all() to prevent notify_all() being called between checking whether the queue is empty and calling wait().

{
  std::lock_guard<std::mutex> lock(m_conditional_mutex);
  m_conditional_lock.notify_one();
}

AccessViolationEnjoyer avatar Jan 16 '19 18:01 AccessViolationEnjoyer

https://stackoverflow.com/questions/35775501/c-should-condition-variable-be-notified-under-lock

yvoinov avatar Jan 16 '19 22:01 yvoinov

https://stackoverflow.com/questions/17101922/do-i-have-to-acquire-lock-before-calling-condition-variable-notify-one/17102100

yvoinov avatar Jan 16 '19 22:01 yvoinov

@yvoinov In the case of submit(), if we don't lock the mutex then I believe that notify_one() could be called between checking whether the queue is empty and wait(). In that case, we will wait forever, or until something else notifies the condition variable. I placed the lock to ensure that we only notify if the other thread is currently waiting. If you don't lock then how else do you prevent this race condition?

https://stackoverflow.com/questions/20982270/sync-is-unreliable-using-stdatomic-and-stdcondition-variable

AccessViolationEnjoyer avatar Jan 16 '19 23:01 AccessViolationEnjoyer

@yvoinov In the case of submit(), if we don't lock the mutex then I believe that notify_one() could be called between checking whether the queue is empty and wait(). In that case, we will wait forever, or until something else notifies the condition variable. I placed the lock to ensure that we only notify if the other thread is currently waiting. If you don't lock then how else do you prevent this race condition?

https://stackoverflow.com/questions/20982270/sync-is-unreliable-using-stdatomic-and-stdcondition-variable

After a lot of testing, I'm using quite different thread pool implementation:

https://github.com/yvoinov/thread-pool-cpp

There is no race condition on submit. Without lock notify_*. This implementation works perfectly, no deadlocks, no races.

To prevent lost wakeup it is enough to lock conditional variable mutex during notify_one(). It described here:

http://www.modernescpp.com/index.php/c-core-guidelines-be-aware-of-the-traps-of-condition-variables

yvoinov avatar Jan 17 '19 11:01 yvoinov