userver icon indicating copy to clipboard operation
userver copied to clipboard

thread_local variables cause data races (UB) when used with userver

Open Anton3 opened this issue 2 years ago • 3 comments

Minimal reproducer:

thread_local int x{};

++x;
engine::Yield();
++x;

Let's say that after the context switch, the task is scheduled on another thread.

It's expected then that after Yield, the new instance of x (which is local to the current thread) will be incremented. But that's not what happens! The compiler can't imagine the situation that the address of a thread_local variable changes, so it only loads the address of x once and caches it across the suspension point. Then it keeps using the variable of another thread, causing a data race, because another thread may use it at the same time (it has every right to do so, because it is its own instance of the thread_local variable).

This bug has been documented with little reaction from compiler maintainers:

  • https://bugs.llvm.org/show_bug.cgi?id=19177
  • https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461
  • https://github.com/ldc-developers/ldc/issues/666

Recent versions of Clang (at least Clang-14) do this optimization under LTO even when variable access happens inside a function.

Marking the variable atomic, volatile, adding asm volatile("" ::: "memory"); does not help.

The only thing that helps is to mark the function that accesses the variable as noinline, as we've done in bug core: prevent clang 14 LTO UB thread_local access. We, however, cannot expect all the user and library code to do that.

A reproducer test has been added in feat engine: demonstrate that thread_local variables don't work with userver.

Note that this issue is not the same as:

thread_local int x{};

MyContainer foo(&x);
engine::Yield();
foo.Frobnicate();

This code pattern is buggy in the context of userver, and is expected to be buggy, and we & our users have learned to avoid it.

Anton3 avatar Jan 25 '23 15:01 Anton3

We now provide compiler::ThreadLocal that should be used in services instead of thread_local.

However, third-party libraries still may and do use thread_local, and until Clang supports an option for fiber-safe thread_local optimizations, there will be no safe solution.

Anton3 avatar Jan 04 '24 16:01 Anton3

We've disabled LTO by default due to ongoing issues with third-party libraries. For example, when linking Jemalloc statically, the following code causes a data race:

void* smth = malloc(...);
engine::Yield();
free(smth);

We are currently discussing the possibility of implementing a "fiber-safe optimizations" flag in Clang. This will take several months.

Anton3 avatar Mar 25 '24 13:03 Anton3