rr
rr copied to clipboard
librrpreload.so pthread_mutex_lock interposition interacts badly with TSAN
A Pernosco customer reports that attempting to record a build of their application with TSAN (LLVM 13's TSAN) quickly triggers a TSAN abort due to the static limit of 64 locked mutexes per thread being exceeded.
The problem here is that TSAN intercepts both pthread_mutex_lock and __pthread_mutex_lock. TSAN will delegate to the "original" pthread_mutex_lock which in this case is librrpreload's override. If DOUBLE_UNDERSCORE_PTHREAD_LOCK_AVAILABLE is true, that will call __pthread_mutex_lock, which is again intercepted by TSAN. The call stack thus looks like:
std::__1::__libcpp_mutex_lock () at __threading_support:371
__interceptor_pthread_mutex_lock () at sanitizer_common_interceptors.inc:4251
pthread_mutex_lock () at overrides.c:100
__interceptor___pthread_mutex_lock () at sanitizer_common_interceptors.inc:4288
Both of these __interceptor prefixed TSAN functions increment the lock count on this mutex, and the branch you would expect to barf here (https://github.com/llvm/llvm-project/blob/release/13.x/compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp#L187) is empty. So every mutex leaves pthread_mutex_lock with a lock count of 2. A pthread_mutex_unlock call will decrement the lock count by 1. TSAN will never be convinced that any lock is unlocked and once 65 unique locks are locked it will die.
I think the easiest way to fix this would be to add a pthread_mutex_unlock wrapper that delegates to __pthread_mutex_unlock and otherwise doesn't do anything. That would likely trick TSAN into doing the right thing here. Though if it ever starts validating that non-recursive locks are in fact never locked twice (which IIRC is UB per POSIX) rr will trip it.