rr icon indicating copy to clipboard operation
rr copied to clipboard

librrpreload.so pthread_mutex_lock interposition interacts badly with TSAN

Open khuey opened this issue 2 years ago • 0 comments

A Pernosco customer reports that attempting to record a build of their application with TSAN (LLVM 13's TSAN) quickly triggers a TSAN abort due to the static limit of 64 locked mutexes per thread being exceeded.

The problem here is that TSAN intercepts both pthread_mutex_lock and __pthread_mutex_lock. TSAN will delegate to the "original" pthread_mutex_lock which in this case is librrpreload's override. If DOUBLE_UNDERSCORE_PTHREAD_LOCK_AVAILABLE is true, that will call __pthread_mutex_lock, which is again intercepted by TSAN. The call stack thus looks like:

std::__1::__libcpp_mutex_lock () at __threading_support:371
__interceptor_pthread_mutex_lock () at sanitizer_common_interceptors.inc:4251
pthread_mutex_lock () at overrides.c:100
__interceptor___pthread_mutex_lock () at sanitizer_common_interceptors.inc:4288

Both of these __interceptor prefixed TSAN functions increment the lock count on this mutex, and the branch you would expect to barf here (https://github.com/llvm/llvm-project/blob/release/13.x/compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp#L187) is empty. So every mutex leaves pthread_mutex_lock with a lock count of 2. A pthread_mutex_unlock call will decrement the lock count by 1. TSAN will never be convinced that any lock is unlocked and once 65 unique locks are locked it will die.

I think the easiest way to fix this would be to add a pthread_mutex_unlock wrapper that delegates to __pthread_mutex_unlock and otherwise doesn't do anything. That would likely trick TSAN into doing the right thing here. Though if it ever starts validating that non-recursive locks are in fact never locked twice (which IIRC is UB per POSIX) rr will trip it.

khuey avatar Jul 14 '22 18:07 khuey