userver
userver copied to clipboard
thread_local variables cause data races (UB) when used with userver
Minimal reproducer:
thread_local int x{};
++x;
engine::Yield();
++x;
Let's say that after the context switch, the task is scheduled on another thread.
It's expected then that after Yield
, the new instance of x
(which is local to the current thread) will be incremented. But that's not what happens! The compiler can't imagine the situation that the address of a thread_local
variable changes, so it only loads the address of x
once and caches it across the suspension point. Then it keeps using the variable of another thread, causing a data race, because another thread may use it at the same time (it has every right to do so, because it is its own instance of the thread_local
variable).
This bug has been documented with little reaction from compiler maintainers:
- https://bugs.llvm.org/show_bug.cgi?id=19177
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461
- https://github.com/ldc-developers/ldc/issues/666
Recent versions of Clang (at least Clang-14) do this optimization under LTO even when variable access happens inside a function.
Marking the variable atomic
, volatile
, adding asm volatile("" ::: "memory");
does not help.
The only thing that helps is to mark the function that accesses the variable as noinline
, as we've done in bug core: prevent clang 14 LTO UB thread_local access. We, however, cannot expect all the user and library code to do that.
A reproducer test has been added in feat engine: demonstrate that thread_local variables don't work with userver.
Note that this issue is not the same as:
thread_local int x{}; MyContainer foo(&x); engine::Yield(); foo.Frobnicate();
This code pattern is buggy in the context of userver, and is expected to be buggy, and we & our users have learned to avoid it.
We now provide compiler::ThreadLocal
that should be used in services instead of thread_local
.
However, third-party libraries still may and do use thread_local
, and until Clang supports an option for fiber-safe thread_local
optimizations, there will be no safe solution.
We've disabled LTO by default due to ongoing issues with third-party libraries. For example, when linking Jemalloc statically, the following code causes a data race:
void* smth = malloc(...);
engine::Yield();
free(smth);
We are currently discussing the possibility of implementing a "fiber-safe optimizations" flag in Clang. This will take several months.