hpx icon indicating copy to clipboard operation
hpx copied to clipboard

Race conditions related to thread_local variables

Open Pansysk75 opened this issue 1 year ago • 0 comments

In various places within HPX, thread_local variables are accessed without locking or synchronization, using a pattern such as:

T& get_var(){
  thread_local T important_var{};
  return important_var;
}
// Multiple threads may concurrently run this code:
void run()
{
  {
    auto& var = get_var();
    do_A(var);
  }

  // Do things, yield, suspend, wait for future, calculate 42, etc...

  {
    auto& var2 = get_var();
    do_B(var2);
  }
}

This can be thread-safe, as long as we don't context switch (ie suspend the current hpx-thread) while holding a reference to the thread_local variable, because each instance of the thread_local variable will be accessed by only one thread at a time (the currently executing thread).

(suspending the hpx-thread will send it back to the thread-queue, where it may be picked up by a different OS-worker-thread, so holding a pointer or reference to the thread_local variable can cause trouble, as it will still refer to the previous thread)

However, a malfunctioning example led us to the realization that compiler optimizations transformed the previous code to something equivalent to:

void run()
{
  auto& var_ref = get_var() // Cache address of thread_local

  {
    do_A(var_ref);
  }

  // Do things, yield, suspend, wait for hpx::future, calculate 42, etc...

  {
   do_B(var_ref);
  }
}

This transformation, in combination with the hpx-thread migrating to different worker-threads, led to unexpected race conditions on the thread_local variable.

While this was spotted in thread_local_caching_allocator, we still seek a general solution to address this usage pattern. As also mentioned in this related stack-overflow question, taking a volatile ptr to the thread_local does not help.

I will update this issue as more is done to address it.

Pansysk75 avatar Sep 09 '24 19:09 Pansysk75