Race conditions related to thread_local variables
In various places within HPX, thread_local variables are accessed without locking or synchronization, using a pattern such as:
T& get_var(){
thread_local T important_var{};
return important_var;
}
// Multiple threads may concurrently run this code:
void run()
{
{
auto& var = get_var();
do_A(var);
}
// Do things, yield, suspend, wait for future, calculate 42, etc...
{
auto& var2 = get_var();
do_B(var2);
}
}
This can be thread-safe, as long as we don't context switch (ie suspend the current hpx-thread) while holding a reference to the thread_local variable, because each instance of the thread_local variable will be accessed by only one thread at a time (the currently executing thread).
(suspending the hpx-thread will send it back to the thread-queue, where it may be picked up by a different OS-worker-thread, so holding a pointer or reference to the thread_local variable can cause trouble, as it will still refer to the previous thread)
However, a malfunctioning example led us to the realization that compiler optimizations transformed the previous code to something equivalent to:
void run()
{
auto& var_ref = get_var() // Cache address of thread_local
{
do_A(var_ref);
}
// Do things, yield, suspend, wait for hpx::future, calculate 42, etc...
{
do_B(var_ref);
}
}
This transformation, in combination with the hpx-thread migrating to different worker-threads, led to unexpected race conditions on the thread_local variable.
While this was spotted in thread_local_caching_allocator, we still seek a general solution to address this usage pattern. As also mentioned in this related stack-overflow question, taking a volatile ptr to the thread_local does not help.
I will update this issue as more is done to address it.