connectedhomeip
connectedhomeip copied to clipboard
Copious TSAN warnings on Linux
Problem
We have lots and lots of TSAN data race warnings on Linux that should be addressed before it results in other potentially bigger blow-ups that are more difficult to debug later.
Example: https://github.com/project-chip/connectedhomeip/runs/5039936950?check_suite_focus=true
Example output:
2022-02-03 02:03:15.802 INFO WARNING: ThreadSanitizer: data race (pid=31995)
2022-02-03 02:03:15.802 INFO Write of size 8 at 0x7b5000010700 by thread T4:
2022-02-03 02:03:15.803 INFO #0 memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:795 (libtsan.so.0+0x3790f)
2022-02-03 02:03:15.803 INFO #1 memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:793 (libtsan.so.0+0x3790f)
2022-02-03 02:03:15.804 INFO #2 g_slice_alloc0 <null> (libglib-2.0.so.0+0x70ac2)
2022-02-03 02:03:15.804 INFO
2022-02-03 02:03:15.805 INFO Previous write of size 8 at 0x7b5000010700 by thread T2:
2022-02-03 02:03:15.805 INFO #0 posix_memalign ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:835 (libtsan.so.0+0x3046d)
2022-02-03 02:03:15.805 INFO #1 <null> <null> (libglib-2.0.so.0+0x6f936)
2022-02-03 02:03:15.806 INFO #2 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/include/c++/9/bits/invoke.h:60 (chip-tv-app+0x1c9585)
2022-02-03 02:03:15.806 INFO #3 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/include/c++/9/bits/invoke.h:95 (chip-tv-app+0x1c94e2)
2022-02-03 02:03:15.806 INFO #4 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/9/thread:244 (chip-tv-app+0x1c9445)
2022-02-03 02:03:15.807 INFO #5 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/include/c++/9/thread:251 (chip-tv-app+0x1c93f2)
2022-02-03 02:03:15.807 INFO #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/include/c++/9/thread:195 (chip-tv-app+0x1c93b3)
2022-02-03 02:03:15.807 INFO #7 <null> <null> (libstdc++.so.6+0xd6de3)
2022-02-03 02:03:15.807 INFO
2022-02-03 02:03:15.808 INFO Location is heap block of size 496 at 0x7b5000010600 allocated by thread T2:
2022-02-03 02:03:15.808 INFO #0 posix_memalign ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:835 (libtsan.so.0+0x3046d)
2022-02-03 02:03:15.808 INFO #1 <null> <null> (libglib-2.0.so.0+0x6f936)
2022-02-03 02:03:15.808 INFO #2 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/include/c++/9/bits/invoke.h:60 (chip-tv-app+0x1c9585)
2022-02-03 02:03:15.808 INFO #3 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/include/c++/9/bits/invoke.h:95 (chip-tv-app+0x1c94e2)
2022-02-03 02:03:15.808 INFO #4 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/9/thread:244 (chip-tv-app+0x1c9445)
2022-02-03 02:03:15.809 INFO #5 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/include/c++/9/thread:251 (chip-tv-app+0x1c93f2)
2022-02-03 02:03:15.809 INFO #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/include/c++/9/thread:195 (chip-tv-app+0x1c93b3)
2022-02-03 02:03:15.809 INFO #7 <null> <null> (libstdc++.so.6+0xd6de3)
2022-02-03 02:03:15.809 INFO
2022-02-03 02:03:15.809 INFO Thread T4 'pool' (tid=32000, running) created by main thread at:
2022-02-03 02:03:15.810 INFO #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
2022-02-03 02:03:15.810 INFO #1 <null> <null> (libglib-2.0.so.0+0x9f2ba)
2022-02-03 02:03:15.810 INFO #2 chip::DeviceLayer::ConnectivityManagerImpl::StartWiFiManagement() ../../examples/tv-app/linux/third_party/connectedhomeip/src/platform/Linux/ConnectivityManagerImpl.cpp:675 (chip-tv-app+0x1b8e55)
2022-02-03 02:03:15.810 INFO #3 ChipLinuxAppInit(int, char**) ../../examples/tv-app/linux/third_party/connectedhomeip/examples/platform/linux/AppMain.cpp:182 (chip-tv-app+0x40fc2)
2022-02-03 02:03:15.810 INFO #4 main ../../examples/tv-app/linux/main.cpp:171 (chip-tv-app+0x3c3f5)
2022-02-03 02:03:15.810 INFO
2022-02-03 02:03:15.811 INFO Thread T2 (tid=31998, running) created by main thread at:
2022-02-03 02:03:15.811 INFO #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
2022-02-03 02:03:15.811 INFO #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xd70a8)
2022-02-03 02:03:15.811 INFO #2 chip::DeviceLayer::PlatformManagerImpl::_InitChipStack() <null> (chip-tv-app+0x1c5b11)
2022-02-03 02:03:15.811 INFO #3 chip::DeviceLayer::PlatformManager::InitChipStack() ../../examples/tv-app/linux/third_party/connectedhomeip/src/include/platform/PlatformManager.h:329 (chip-tv-app+0x3d37f)
2022-02-03 02:03:15.812 INFO #4 ChipLinuxAppInit(int, char**) ../../examples/tv-app/linux/third_party/connectedhomeip/examples/platform/linux/AppMain.cpp:151 (chip-tv-app+0x40dee)
2022-02-03 02:03:15.812 INFO #5 main ../../examples/tv-app/linux/main.cpp:171 (chip-tv-app+0x3c3f5)
2022-02-03 02:03:15.812 INFO
2022-02-03 02:03:15.812 INFO SUMMARY: ThreadSanitizer: data race (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x70ac2) in g_slice_alloc0
2022-02-03 02:03:15.812 INFO ==================
V1 review: fixes in master for this are bug fixes and would be acceptable, however this is not a V1 blocker.
This might be related: https://gitlab.gnome.org/GNOME/glib/-/issues/1672
I've done some tests of this issue and it seems that on Ubuntu 22.04 TSAN has been fixed - it no longer reports such false positives.
One can use provided simple test code to check that with different TSAN versions. Compile the test code as follows:
g++ -g -O0 test.cpp -fsanitize=thread -pthread $(pkg-config --cflags --libs glib-2.0)
test.cpp
#include <iostream>
#include <thread>
#include <glib.h>
struct data {
int value;
};
static struct data *sharedData1;
static struct data *sharedData2;
void glibMainLoopThread(GMainLoop *loop) {
g_main_loop_run(loop);
}
gboolean glibCallback1(void * data) {
auto *sharedData = reinterpret_cast<struct data *>(data);
std::cout << "T: sharedData1 = " << sharedData->value << std::endl;
sharedData->value++;
return FALSE;
}
gboolean glibCallback2(void * data) {
auto *sharedData = reinterpret_cast<struct data *>(data);
std::cout << "T: sharedData2 = " << sharedData->value << std::endl;
sharedData->value++;
return FALSE;
}
int main() {
sharedData1 = new struct data({10});
GMainLoop *loop = g_main_loop_new(NULL, FALSE);
std::thread t(glibMainLoopThread, loop);
sharedData2 = new struct data({20});
// That's not a data race
// NOTE: TSAN on Ubuntu 20.04 reports it (false positive)
std::cout << "sharedData1 = " << sharedData1->value << std::endl;
std::cout << "sharedData2 = " << sharedData2->value << std::endl;
g_idle_add(glibCallback1, sharedData1);
g_idle_add(glibCallback2, sharedData2);
// Here we've got a data race (callback is already added to main loop thread)
// std::cout << "sharedData2 = " << sharedData2->value << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(1));
g_main_loop_quit(loop);
t.join();
std::cout << "sharedData1 = " << sharedData1->value << std::endl;
std::cout << "sharedData2 = " << sharedData2->value << std::endl;
delete sharedData1;
delete sharedData2;
return 0;
}
On Ubuntu 22.04 I've got no errors (unless commented std::cout line is included). However, on Ubuntu 20.04, I've got error mentioned in https://github.com/project-chip/connectedhomeip/issues/14710#issuecomment-1028551023.
Output from Ubuntu 22.04:
$ ./a.out
sharedData1 = 10
sharedData2 = 20
T: sharedData1 = 10
T: sharedData2 = 20
sharedData1 = 11
sharedData2 = 21
With our CI the problem occurs because as for now the ubuntu-latest == ubuntu-20.04. So, maybe we should change to use ubuntu-22.04 with workflows where TSAN is used?
Issue Scrub: @woody-apple to pin us to ubuntu-22.04 in CI. @andy31415 will help check to make sure warnings aren't being suppressed afterwards.
After more research I've found the TRUE cause of false positives. The proper answer is here: https://stackoverflow.com/a/74328089
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.