connectedhomeip icon indicating copy to clipboard operation
connectedhomeip copied to clipboard

Copious TSAN warnings on Linux

Open mrjerryjohns opened this issue 3 years ago • 3 comments

Problem

We have lots and lots of TSAN data race warnings on Linux that should be addressed before it results in other potentially bigger blow-ups that are more difficult to debug later.

Example: https://github.com/project-chip/connectedhomeip/runs/5039936950?check_suite_focus=true

mrjerryjohns avatar Feb 02 '22 18:02 mrjerryjohns

Example output:
2022-02-03 02:03:15.802 INFO    WARNING: ThreadSanitizer: data race (pid=31995)
2022-02-03 02:03:15.802 INFO      Write of size 8 at 0x7b5000010700 by thread T4:
2022-02-03 02:03:15.803 INFO        #0 memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:795 (libtsan.so.0+0x3790f)
2022-02-03 02:03:15.803 INFO        #1 memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:793 (libtsan.so.0+0x3790f)
2022-02-03 02:03:15.804 INFO        #2 g_slice_alloc0 <null> (libglib-2.0.so.0+0x70ac2)
2022-02-03 02:03:15.804 INFO    
2022-02-03 02:03:15.805 INFO      Previous write of size 8 at 0x7b5000010700 by thread T2:
2022-02-03 02:03:15.805 INFO        #0 posix_memalign ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:835 (libtsan.so.0+0x3046d)
2022-02-03 02:03:15.805 INFO        #1 <null> <null> (libglib-2.0.so.0+0x6f936)
2022-02-03 02:03:15.806 INFO        #2 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/include/c++/9/bits/invoke.h:60 (chip-tv-app+0x1c9585)
2022-02-03 02:03:15.806 INFO        #3 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/include/c++/9/bits/invoke.h:95 (chip-tv-app+0x1c94e2)
2022-02-03 02:03:15.806 INFO        #4 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/9/thread:244 (chip-tv-app+0x1c9445)
2022-02-03 02:03:15.807 INFO        #5 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/include/c++/9/thread:251 (chip-tv-app+0x1c93f2)
2022-02-03 02:03:15.807 INFO        #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/include/c++/9/thread:195 (chip-tv-app+0x1c93b3)
2022-02-03 02:03:15.807 INFO        #7 <null> <null> (libstdc++.so.6+0xd6de3)
2022-02-03 02:03:15.807 INFO    
2022-02-03 02:03:15.808 INFO      Location is heap block of size 496 at 0x7b5000010600 allocated by thread T2:
2022-02-03 02:03:15.808 INFO        #0 posix_memalign ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:835 (libtsan.so.0+0x3046d)
2022-02-03 02:03:15.808 INFO        #1 <null> <null> (libglib-2.0.so.0+0x6f936)
2022-02-03 02:03:15.808 INFO        #2 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/include/c++/9/bits/invoke.h:60 (chip-tv-app+0x1c9585)
2022-02-03 02:03:15.808 INFO        #3 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/include/c++/9/bits/invoke.h:95 (chip-tv-app+0x1c94e2)
2022-02-03 02:03:15.808 INFO        #4 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/9/thread:244 (chip-tv-app+0x1c9445)
2022-02-03 02:03:15.809 INFO        #5 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/include/c++/9/thread:251 (chip-tv-app+0x1c93f2)
2022-02-03 02:03:15.809 INFO        #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/include/c++/9/thread:195 (chip-tv-app+0x1c93b3)
2022-02-03 02:03:15.809 INFO        #7 <null> <null> (libstdc++.so.6+0xd6de3)
2022-02-03 02:03:15.809 INFO    
2022-02-03 02:03:15.809 INFO      Thread T4 'pool' (tid=32000, running) created by main thread at:
2022-02-03 02:03:15.810 INFO        #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
2022-02-03 02:03:15.810 INFO        #1 <null> <null> (libglib-2.0.so.0+0x9f2ba)
2022-02-03 02:03:15.810 INFO        #2 chip::DeviceLayer::ConnectivityManagerImpl::StartWiFiManagement() ../../examples/tv-app/linux/third_party/connectedhomeip/src/platform/Linux/ConnectivityManagerImpl.cpp:675 (chip-tv-app+0x1b8e55)
2022-02-03 02:03:15.810 INFO        #3 ChipLinuxAppInit(int, char**) ../../examples/tv-app/linux/third_party/connectedhomeip/examples/platform/linux/AppMain.cpp:182 (chip-tv-app+0x40fc2)
2022-02-03 02:03:15.810 INFO        #4 main ../../examples/tv-app/linux/main.cpp:171 (chip-tv-app+0x3c3f5)
2022-02-03 02:03:15.810 INFO    
2022-02-03 02:03:15.811 INFO      Thread T2 (tid=31998, running) created by main thread at:
2022-02-03 02:03:15.811 INFO        #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
2022-02-03 02:03:15.811 INFO        #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xd70a8)
2022-02-03 02:03:15.811 INFO        #2 chip::DeviceLayer::PlatformManagerImpl::_InitChipStack() <null> (chip-tv-app+0x1c5b11)
2022-02-03 02:03:15.811 INFO        #3 chip::DeviceLayer::PlatformManager::InitChipStack() ../../examples/tv-app/linux/third_party/connectedhomeip/src/include/platform/PlatformManager.h:329 (chip-tv-app+0x3d37f)
2022-02-03 02:03:15.812 INFO        #4 ChipLinuxAppInit(int, char**) ../../examples/tv-app/linux/third_party/connectedhomeip/examples/platform/linux/AppMain.cpp:151 (chip-tv-app+0x40dee)
2022-02-03 02:03:15.812 INFO        #5 main ../../examples/tv-app/linux/main.cpp:171 (chip-tv-app+0x3c3f5)
2022-02-03 02:03:15.812 INFO    
2022-02-03 02:03:15.812 INFO    SUMMARY: ThreadSanitizer: data race (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x70ac2) in g_slice_alloc0
2022-02-03 02:03:15.812 INFO    ==================

bzbarsky-apple avatar Feb 03 '22 02:02 bzbarsky-apple

V1 review: fixes in master for this are bug fixes and would be acceptable, however this is not a V1 blocker.

andy31415 avatar Aug 24 '22 17:08 andy31415

This might be related: https://gitlab.gnome.org/GNOME/glib/-/issues/1672

arkq avatar Oct 27 '22 15:10 arkq

I've done some tests of this issue and it seems that on Ubuntu 22.04 TSAN has been fixed - it no longer reports such false positives.

One can use provided simple test code to check that with different TSAN versions. Compile the test code as follows: g++ -g -O0 test.cpp -fsanitize=thread -pthread $(pkg-config --cflags --libs glib-2.0)

test.cpp
#include <iostream>
#include <thread>
#include <glib.h>

struct data {
        int value;
};

static struct data *sharedData1;
static struct data *sharedData2;

void glibMainLoopThread(GMainLoop *loop) {
        g_main_loop_run(loop);
}

gboolean glibCallback1(void * data) {
        auto *sharedData = reinterpret_cast<struct data *>(data);
        std::cout << "T: sharedData1 = " << sharedData->value << std::endl;
        sharedData->value++;
        return FALSE;
}

gboolean glibCallback2(void * data) {
        auto *sharedData = reinterpret_cast<struct data *>(data);
        std::cout << "T: sharedData2 = " << sharedData->value << std::endl;
        sharedData->value++;
        return FALSE;
}

int main() {

        sharedData1 = new struct data({10});

        GMainLoop *loop = g_main_loop_new(NULL, FALSE);
        std::thread t(glibMainLoopThread, loop);

        sharedData2 = new struct data({20});

        // That's not a data race
        // NOTE: TSAN on Ubuntu 20.04 reports it (false positive)
        std::cout << "sharedData1 = " << sharedData1->value << std::endl;
        std::cout << "sharedData2 = " << sharedData2->value << std::endl;

        g_idle_add(glibCallback1, sharedData1);
        g_idle_add(glibCallback2, sharedData2);

        // Here we've got a data race (callback is already added to main loop thread)
        // std::cout << "sharedData2 = " << sharedData2->value << std::endl;

        std::this_thread::sleep_for(std::chrono::seconds(1));

        g_main_loop_quit(loop);
        t.join();

        std::cout << "sharedData1 = " << sharedData1->value << std::endl;
        std::cout << "sharedData2 = " << sharedData2->value << std::endl;

        delete sharedData1;
        delete sharedData2;
        return 0;
}

On Ubuntu 22.04 I've got no errors (unless commented std::cout line is included). However, on Ubuntu 20.04, I've got error mentioned in https://github.com/project-chip/connectedhomeip/issues/14710#issuecomment-1028551023.

Output from Ubuntu 22.04:

$ ./a.out
sharedData1 = 10
sharedData2 = 20
T: sharedData1 = 10
T: sharedData2 = 20
sharedData1 = 11
sharedData2 = 21

With our CI the problem occurs because as for now the ubuntu-latest == ubuntu-20.04. So, maybe we should change to use ubuntu-22.04 with workflows where TSAN is used?

arkq avatar Oct 29 '22 20:10 arkq

Issue Scrub: @woody-apple to pin us to ubuntu-22.04 in CI. @andy31415 will help check to make sure warnings aren't being suppressed afterwards.

woody-apple avatar Nov 02 '22 17:11 woody-apple

After more research I've found the TRUE cause of false positives. The proper answer is here: https://stackoverflow.com/a/74328089

arkq avatar Nov 07 '22 09:11 arkq

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar May 08 '23 21:05 stale[bot]