oneTBB icon indicating copy to clipboard operation
oneTBB copied to clipboard

Clang thread sanitizer (TSan) reports many errors, most tests fail

Open seanm opened this issue 4 years ago • 19 comments

I built TBB from git master using Clang's -fsanitize=thread option:

https://clang.llvm.org/docs/ThreadSanitizer.html

The result is that almost all the unit tests fail.

Are these false positives or true positives?

If false positives, is there an 'official' suppression list?

seanm avatar Mar 05 '21 18:03 seanm

There are a lot of true positives (especially, in tbbmalloc). We are fixing them from time to time. Some basics tests, (e.g. for parallel_for, parallel_reduce and task_group) are not expected to issue hits without tbbmalloc.

alexey-katranov avatar Mar 06 '21 08:03 alexey-katranov

There are a lot of true positives

Oh.

We are fixing them from time to time.

I would have thought threading correctness issues in a threading library would be high priority. :) In my experience, TSan makes these otherwise hard to debug and hard to reproduce issues much easier to fix, has that not been the case with TBB?

Some basics tests, (e.g. for parallel_for...

I just tried ctest -R parallel_for -V and TSan reports data races there. The backtraces do contains a lot of libtbbmalloc_debug.dylib.

I'm not sure what tbbmalloc is, is it some build option that can be disabled?

seanm avatar Mar 08 '21 19:03 seanm

I would have thought threading correctness issues in a threading library would be high priority. :) In my experience, TSan makes these otherwise hard to debug and hard to reproduce issues much easier to fix, has that not been the case with TBB?

Definitely threading correctness issues is high priority; however, C++ races reported with Thread Sanitizer are not always correctness issues on real hardware. Back to 2006 when the first version of TBB was released, C++ did not know anything about threading, i.e. any threaded application had UB from pre-C++11 perspective. In such environment, TBB (and other threading solutions) was build based on experience, practice and knowledge about target architecture. The first supported platform was x86 with its TSO memory model. It is relatively strong model, i.e. any properly aligned int provides release-acquire semantics for any store-load pair. This basic assumption became fundamental thing inside TBB, especially in tbbmalloc. With new platforms (like IA-64, ARM and others) in pre-C++11 environment, the implementation became even more complicated but remained correct for each particular platform.

With C++11, we started moving from practice based approaches to C++11 memory model. With oneTBB, we replaced all tbb::atomic with std::atomic that allowed us using such tools like Thread Sanitizer. However, not all synchronizations were based on tbb::atomics, some of them are still based on primitives types (known to work correctly) but they produce C++ races that Thread Sanitizer detects. Much of them does not contribute into correctness issues; however, we pay attention to each reported race because several issues were found and fixed that might affect correctness.

I'm not sure what tbbmalloc is, is it some build option that can be disabled?

tbbmalloc is a scalable allocator that is supposed to improve TBB performance in heavy threaded applications. It is optional but I do not think that TBB cmake has an option to disable the build of tbbmalloc. There are two approaches to try tests without tbbmalloc:

  • build only required tests, e.g. make conformance_parallel_for test_parallel_for
  • build all but remove libtbbmalloc* files from build directory

alexey-katranov avatar Mar 09 '21 07:03 alexey-katranov

Thanks for the detailed response, much appreciated!

So I've just deleted all the libtbbmalloc* files and tried again. TSan still complains but now I indeed don't see tbbmalloc in the backtraces. Here's the output:

$ ctest -R parallel_for -V
UpdateCTestConfiguration  from :/Users/sean/external/TBB-test-bin/DartConfiguration.tcl
UpdateCTestConfiguration  from :/Users/sean/external/TBB-test-bin/DartConfiguration.tcl
Test project /Users/sean/external/TBB-test-bin
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 11
    Start 11: test_parallel_for

11: Test command: /Users/sean/external/TBB-test-bin/appleclang_11.0_cxx11_64_debug/test_parallel_for "--force-colors=1"
11: Test timeout computed to be: 10000000
11: [doctest] doctest version is "2.3.5"
11: [doctest] run with "--help" for options
11: ===============================================================================
11: [doctest] test cases:      5 |      5 passed |      0 failed |      0 skipped
11: [doctest] assertions:   1125 |   1125 passed |      0 failed |
11: [doctest] Status: SUCCESS!
11: ==================
11: WARNING: ThreadSanitizer: data race (pid=29364)
11:   Write of size 8 at 0x7e80034a70e0 by thread T11:
11:     #0 tbb::detail::r1::circular_doubly_linked_list_with_sentinel::base_node::base_node() concurrent_monitor.h:41 (libtbb_debug.12.dylib:x86_64+0x3382)
11:     #1 tbb::detail::r1::wait_node<tbb::detail::r1::address_context>::wait_node(tbb::detail::r1::address_context) concurrent_monitor.h:106 (libtbb_debug.12.dylib:x86_64+0x2f6e)
11:     #2 tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>::sleep_node(tbb::detail::r1::address_context) concurrent_monitor.h:146 (libtbb_debug.12.dylib:x86_64+0x2ebc)
11:     #3 tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>::sleep_node(tbb::detail::r1::address_context) concurrent_monitor.h:146 (libtbb_debug.12.dylib:x86_64+0x2268)
11:     #4 tbb::detail::r1::wait_on_address(void*, tbb::detail::d1::delegate_base&, unsigned long) address_waiter.cpp:70 (libtbb_debug.12.dylib:x86_64+0x1f96)
11:     #5 void tbb::detail::d1::adaptive_wait_on_address<tbb::detail::d1::rw_mutex::lock()::'lambda'()>(void*, tbb::detail::d1::rw_mutex::lock()::'lambda'(), unsigned long) _waitable_atomic.h:38 (libtbb_debug.12.dylib:x86_64+0x604c4)
11:     #6 tbb::detail::d1::rw_mutex::lock() rw_mutex.h:64 (libtbb_debug.12.dylib:x86_64+0x5ba71)
11:     #7 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::acquire(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:106 (libtbb_debug.12.dylib:x86_64+0x619c3)
11:     #8 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::rw_scoped_lock(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:86 (libtbb_debug.12.dylib:x86_64+0x618b7)
11:     #9 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::rw_scoped_lock(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:85 (libtbb_debug.12.dylib:x86_64+0x5a8ac)
11:     #10 tbb::detail::r1::market::adjust_demand(tbb::detail::r1::arena&, int, bool) market.cpp:517 (libtbb_debug.12.dylib:x86_64+0x5d09b)
11:     #11 tbb::detail::r1::arena::is_out_of_work() arena.cpp:320 (libtbb_debug.12.dylib:x86_64+0xd0de)
11:     #12 tbb::detail::r1::waiter_base::pause() waiters.h:36 (libtbb_debug.12.dylib:x86_64+0x26823)
11:     #13 tbb::detail::r1::outermost_worker_waiter::pause(tbb::detail::r1::arena_slot&) waiters.h:69 (libtbb_debug.12.dylib:x86_64+0x24014)
11:     #14 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool) task_dispatcher.h:231 (libtbb_debug.12.dylib:x86_64+0x27940)
11:     #15 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) task_dispatcher.h:343 (libtbb_debug.12.dylib:x86_64+0x1ee5f)
11:     #16 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) task_dispatcher.h:456 (libtbb_debug.12.dylib:x86_64+0xa2d9)
11:     #17 tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) arena.cpp:133 (libtbb_debug.12.dylib:x86_64+0x9500)
11:     #18 tbb::detail::r1::market::process(rml::job&) market.cpp:596 (libtbb_debug.12.dylib:x86_64+0x5dca9)
11:     #19 tbb::detail::r1::rml::private_worker::run() private_server.cpp:267 (libtbb_debug.12.dylib:x86_64+0x6e4d6)
11:     #20 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Previous write of size 8 at 0x7e80034a70e0 by thread T2:
11:     #0 tbb::detail::r1::circular_doubly_linked_list_with_sentinel::add(tbb::detail::r1::circular_doubly_linked_list_with_sentinel::base_node*) concurrent_monitor.h:57 (libtbb_debug.12.dylib:x86_64+0x6195)
11:     #1 void tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::address_context>::notify_relaxed<tbb::detail::r1::notify_by_address(void*, unsigned long)::$_0>(tbb::detail::r1::notify_by_address(void*, unsigned long)::$_0 const&) concurrent_monitor.h:359 (libtbb_debug.12.dylib:x86_64+0x2578)
11:     #2 tbb::detail::r1::notify_by_address(void*, unsigned long) address_waiter.cpp:80 (libtbb_debug.12.dylib:x86_64+0x2335)
11:     #3 tbb::detail::d1::rw_mutex::unlock_shared() rw_mutex.h:137 (libtbb_debug.12.dylib:x86_64+0x62527)
11:     #4 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::release() _scoped_lock.h:131 (libtbb_debug.12.dylib:x86_64+0x6243e)
11:     #5 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::~rw_scoped_lock() _scoped_lock.h:92 (libtbb_debug.12.dylib:x86_64+0x62349)
11:     #6 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::~rw_scoped_lock() _scoped_lock.h:90 (libtbb_debug.12.dylib:x86_64+0x5ae18)
11:     #7 tbb::detail::r1::market::arena_in_need(tbb::detail::r1::arena*) market.cpp:383 (libtbb_debug.12.dylib:x86_64+0x5c0e5)
11:     #8 tbb::detail::r1::market::process(rml::job&) market.cpp:595 (libtbb_debug.12.dylib:x86_64+0x5dc81)
11:     #9 tbb::detail::r1::rml::private_worker::run() private_server.cpp:267 (libtbb_debug.12.dylib:x86_64+0x6e4d6)
11:     #10 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Location is stack of thread T11.
11: 
11:   Thread T11 (tid=28826241, running) created by thread T6 at:
11:     #0 pthread_create <null>:55209328 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2933d)
11:     #1 tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) rml_thread_monitor.h:208 (libtbb_debug.12.dylib:x86_64+0x72567)
11:     #2 tbb::detail::r1::rml::private_worker::wake_or_launch() private_server.cpp:299 (libtbb_debug.12.dylib:x86_64+0x70e9f)
11:     #3 tbb::detail::r1::rml::private_server::wake_some(int) private_server.cpp:397 (libtbb_debug.12.dylib:x86_64+0x70523)
11:     #4 tbb::detail::r1::rml::private_server::propagate_chain_reaction() private_server.cpp:159 (libtbb_debug.12.dylib:x86_64+0x6f200)
11:     #5 tbb::detail::r1::rml::private_worker::run() private_server.cpp:258 (libtbb_debug.12.dylib:x86_64+0x6e3f6)
11:     #6 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Thread T2 (tid=28826230, running) created by main thread at:
11:     #0 pthread_create <null>:55209328 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2933d)
11:     #1 tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) rml_thread_monitor.h:208 (libtbb_debug.12.dylib:x86_64+0x72567)
11:     #2 tbb::detail::r1::rml::private_worker::wake_or_launch() private_server.cpp:299 (libtbb_debug.12.dylib:x86_64+0x70e9f)
11:     #3 tbb::detail::r1::rml::private_server::wake_some(int) private_server.cpp:397 (libtbb_debug.12.dylib:x86_64+0x70523)
11:     #4 tbb::detail::r1::rml::private_server::adjust_job_count_estimate(int) private_server.cpp:408 (libtbb_debug.12.dylib:x86_64+0x710b0)
11:     #5 tbb::detail::r1::market::adjust_demand(tbb::detail::r1::arena&, int, bool) market.cpp:585 (libtbb_debug.12.dylib:x86_64+0x5d7f8)
11:     #6 void tbb::detail::r1::arena::advertise_new_work<(tbb::detail::r1::arena::new_work_type)0>() arena.h:540 (libtbb_debug.12.dylib:x86_64+0x83722)
11:     #7 tbb::detail::r1::spawn_and_notify(tbb::detail::d1::task&, tbb::detail::r1::arena_slot*, tbb::detail::r1::arena*) task_dispatcher.cpp:26 (libtbb_debug.12.dylib:x86_64+0x828e2)
11:     #8 tbb::detail::r1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:39 (libtbb_debug.12.dylib:x86_64+0x8285a)
11:     #9 tbb::detail::d1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) _task.h:187 (test_parallel_for:x86_64+0x1000344bd)
11:     #10 tbb::detail::d1::auto_partition_type::spawn_task(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) partitioner.h:504 (test_parallel_for:x86_64+0x1000343f4)
11:     #11 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::spawn_self(tbb::detail::d1::execution_data&) parallel_for.h:121 (test_parallel_for:x86_64+0x1000336a7)
11:     #12 void tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::offer_work_impl<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d0::split&>(tbb::detail::d1::execution_data&, tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&&&, tbb::detail::d0::split&&&) parallel_for.h:117 (test_parallel_for:x86_64+0x100033344)
11:     #13 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::offer_work(tbb::detail::d0::split&, tbb::detail::d1::execution_data&) parallel_for.h:99 (test_parallel_for:x86_64+0x100032bd9)
11:     #14 void tbb::detail::d1::partition_type_base<tbb::detail::d1::auto_partition_type>::execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) partitioner.h:284 (test_parallel_for:x86_64+0x100032511)
11:     #15 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) parallel_for.h:147 (test_parallel_for:x86_64+0x100031e6a)
11:     #16 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) task_dispatcher.h:315 (libtbb_debug.12.dylib:x86_64+0x8d752)
11:     #17 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) task_dispatcher.h:456 (libtbb_debug.12.dylib:x86_64+0x84079)
11:     #18 tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:168 (libtbb_debug.12.dylib:x86_64+0x839fb)
11:     #19 tbb::detail::r1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:121 (libtbb_debug.12.dylib:x86_64+0x8380f)
11:     #20 tbb::detail::d1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) _task.h:196 (test_parallel_for:x86_64+0x100031868)
11:     #21 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&, tbb::detail::d1::auto_partitioner const&, tbb::detail::d1::task_group_context&) parallel_for.h:89 (test_parallel_for:x86_64+0x100031575)
11:     #22 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&, tbb::detail::d1::auto_partitioner const&) parallel_for.h:78 (test_parallel_for:x86_64+0x10003133a)
11:     #23 void tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> > >(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&) parallel_for.h:205 (test_parallel_for:x86_64+0x100030bfd)
11:     #24 void TestVectorTypes<ClassWithVectorType<float vector[4]> >() test_parallel_for.cpp:76 (test_parallel_for:x86_64+0x100030502)
11:     #25 _DOCTEST_ANON_FUNC_84() test_parallel_for.cpp:316 (test_parallel_for:x86_64+0x10001d275)
11:     #26 doctest::Context::run() doctest.h:5882 (test_parallel_for:x86_64+0x100015653)
11:     #27 main doctest.h:5962 (test_parallel_for:x86_64+0x10001959d)
11: 
11: SUMMARY: ThreadSanitizer: data race concurrent_monitor.h:41 in tbb::detail::r1::circular_doubly_linked_list_with_sentinel::base_node::base_node()
11: ==================
11: ==================
11: WARNING: ThreadSanitizer: data race (pid=29364)
11:   Write of size 8 at 0x7e80034a70e8 by thread T11:
11:     #0 tbb::detail::r1::circular_doubly_linked_list_with_sentinel::base_node::base_node() concurrent_monitor.h:41 (libtbb_debug.12.dylib:x86_64+0x339a)
11:     #1 tbb::detail::r1::wait_node<tbb::detail::r1::address_context>::wait_node(tbb::detail::r1::address_context) concurrent_monitor.h:106 (libtbb_debug.12.dylib:x86_64+0x2f6e)
11:     #2 tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>::sleep_node(tbb::detail::r1::address_context) concurrent_monitor.h:146 (libtbb_debug.12.dylib:x86_64+0x2ebc)
11:     #3 tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>::sleep_node(tbb::detail::r1::address_context) concurrent_monitor.h:146 (libtbb_debug.12.dylib:x86_64+0x2268)
11:     #4 tbb::detail::r1::wait_on_address(void*, tbb::detail::d1::delegate_base&, unsigned long) address_waiter.cpp:70 (libtbb_debug.12.dylib:x86_64+0x1f96)
11:     #5 void tbb::detail::d1::adaptive_wait_on_address<tbb::detail::d1::rw_mutex::lock()::'lambda'()>(void*, tbb::detail::d1::rw_mutex::lock()::'lambda'(), unsigned long) _waitable_atomic.h:38 (libtbb_debug.12.dylib:x86_64+0x604c4)
11:     #6 tbb::detail::d1::rw_mutex::lock() rw_mutex.h:64 (libtbb_debug.12.dylib:x86_64+0x5ba71)
11:     #7 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::acquire(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:106 (libtbb_debug.12.dylib:x86_64+0x619c3)
11:     #8 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::rw_scoped_lock(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:86 (libtbb_debug.12.dylib:x86_64+0x618b7)
11:     #9 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::rw_scoped_lock(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:85 (libtbb_debug.12.dylib:x86_64+0x5a8ac)
11:     #10 tbb::detail::r1::market::adjust_demand(tbb::detail::r1::arena&, int, bool) market.cpp:517 (libtbb_debug.12.dylib:x86_64+0x5d09b)
11:     #11 tbb::detail::r1::arena::is_out_of_work() arena.cpp:320 (libtbb_debug.12.dylib:x86_64+0xd0de)
11:     #12 tbb::detail::r1::waiter_base::pause() waiters.h:36 (libtbb_debug.12.dylib:x86_64+0x26823)
11:     #13 tbb::detail::r1::outermost_worker_waiter::pause(tbb::detail::r1::arena_slot&) waiters.h:69 (libtbb_debug.12.dylib:x86_64+0x24014)
11:     #14 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool) task_dispatcher.h:231 (libtbb_debug.12.dylib:x86_64+0x27940)
11:     #15 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) task_dispatcher.h:343 (libtbb_debug.12.dylib:x86_64+0x1ee5f)
11:     #16 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) task_dispatcher.h:456 (libtbb_debug.12.dylib:x86_64+0xa2d9)
11:     #17 tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) arena.cpp:133 (libtbb_debug.12.dylib:x86_64+0x9500)
11:     #18 tbb::detail::r1::market::process(rml::job&) market.cpp:596 (libtbb_debug.12.dylib:x86_64+0x5dca9)
11:     #19 tbb::detail::r1::rml::private_worker::run() private_server.cpp:267 (libtbb_debug.12.dylib:x86_64+0x6e4d6)
11:     #20 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Previous write of size 8 at 0x7e80034a70e8 by thread T2:
11:     [failed to restore the stack]
11: 
11:   Location is stack of thread T11.
11: 
11:   Thread T11 (tid=28826241, running) created by thread T6 at:
11:     #0 pthread_create <null>:55209328 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2933d)
11:     #1 tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) rml_thread_monitor.h:208 (libtbb_debug.12.dylib:x86_64+0x72567)
11:     #2 tbb::detail::r1::rml::private_worker::wake_or_launch() private_server.cpp:299 (libtbb_debug.12.dylib:x86_64+0x70e9f)
11:     #3 tbb::detail::r1::rml::private_server::wake_some(int) private_server.cpp:397 (libtbb_debug.12.dylib:x86_64+0x70523)
11:     #4 tbb::detail::r1::rml::private_server::propagate_chain_reaction() private_server.cpp:159 (libtbb_debug.12.dylib:x86_64+0x6f200)
11:     #5 tbb::detail::r1::rml::private_worker::run() private_server.cpp:258 (libtbb_debug.12.dylib:x86_64+0x6e3f6)
11:     #6 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Thread T2 (tid=28826230, running) created by main thread at:
11:     #0 pthread_create <null>:55209328 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2933d)
11:     #1 tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) rml_thread_monitor.h:208 (libtbb_debug.12.dylib:x86_64+0x72567)
11:     #2 tbb::detail::r1::rml::private_worker::wake_or_launch() private_server.cpp:299 (libtbb_debug.12.dylib:x86_64+0x70e9f)
11:     #3 tbb::detail::r1::rml::private_server::wake_some(int) private_server.cpp:397 (libtbb_debug.12.dylib:x86_64+0x70523)
11:     #4 tbb::detail::r1::rml::private_server::adjust_job_count_estimate(int) private_server.cpp:408 (libtbb_debug.12.dylib:x86_64+0x710b0)
11:     #5 tbb::detail::r1::market::adjust_demand(tbb::detail::r1::arena&, int, bool) market.cpp:585 (libtbb_debug.12.dylib:x86_64+0x5d7f8)
11:     #6 void tbb::detail::r1::arena::advertise_new_work<(tbb::detail::r1::arena::new_work_type)0>() arena.h:540 (libtbb_debug.12.dylib:x86_64+0x83722)
11:     #7 tbb::detail::r1::spawn_and_notify(tbb::detail::d1::task&, tbb::detail::r1::arena_slot*, tbb::detail::r1::arena*) task_dispatcher.cpp:26 (libtbb_debug.12.dylib:x86_64+0x828e2)
11:     #8 tbb::detail::r1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:39 (libtbb_debug.12.dylib:x86_64+0x8285a)
11:     #9 tbb::detail::d1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) _task.h:187 (test_parallel_for:x86_64+0x1000344bd)
11:     #10 tbb::detail::d1::auto_partition_type::spawn_task(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) partitioner.h:504 (test_parallel_for:x86_64+0x1000343f4)
11:     #11 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::spawn_self(tbb::detail::d1::execution_data&) parallel_for.h:121 (test_parallel_for:x86_64+0x1000336a7)
11:     #12 void tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::offer_work_impl<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d0::split&>(tbb::detail::d1::execution_data&, tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&&&, tbb::detail::d0::split&&&) parallel_for.h:117 (test_parallel_for:x86_64+0x100033344)
11:     #13 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::offer_work(tbb::detail::d0::split&, tbb::detail::d1::execution_data&) parallel_for.h:99 (test_parallel_for:x86_64+0x100032bd9)
11:     #14 void tbb::detail::d1::partition_type_base<tbb::detail::d1::auto_partition_type>::execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) partitioner.h:284 (test_parallel_for:x86_64+0x100032511)
11:     #15 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) parallel_for.h:147 (test_parallel_for:x86_64+0x100031e6a)
11:     #16 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) task_dispatcher.h:315 (libtbb_debug.12.dylib:x86_64+0x8d752)
11:     #17 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) task_dispatcher.h:456 (libtbb_debug.12.dylib:x86_64+0x84079)
11:     #18 tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:168 (libtbb_debug.12.dylib:x86_64+0x839fb)
11:     #19 tbb::detail::r1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:121 (libtbb_debug.12.dylib:x86_64+0x8380f)
11:     #20 tbb::detail::d1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) _task.h:196 (test_parallel_for:x86_64+0x100031868)
11:     #21 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&, tbb::detail::d1::auto_partitioner const&, tbb::detail::d1::task_group_context&) parallel_for.h:89 (test_parallel_for:x86_64+0x100031575)
11:     #22 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&, tbb::detail::d1::auto_partitioner const&) parallel_for.h:78 (test_parallel_for:x86_64+0x10003133a)
11:     #23 void tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> > >(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&) parallel_for.h:205 (test_parallel_for:x86_64+0x100030bfd)
11:     #24 void TestVectorTypes<ClassWithVectorType<float vector[4]> >() test_parallel_for.cpp:76 (test_parallel_for:x86_64+0x100030502)
11:     #25 _DOCTEST_ANON_FUNC_84() test_parallel_for.cpp:316 (test_parallel_for:x86_64+0x10001d275)
11:     #26 doctest::Context::run() doctest.h:5882 (test_parallel_for:x86_64+0x100015653)
11:     #27 main doctest.h:5962 (test_parallel_for:x86_64+0x10001959d)
11: 
11: SUMMARY: ThreadSanitizer: data race concurrent_monitor.h:41 in tbb::detail::r1::circular_doubly_linked_list_with_sentinel::base_node::base_node()
11: ==================
11: ==================
11: WARNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) (pid=29364)
11:   Write of size 8 at 0x7e80034a70d8 by thread T11:
11:     #0 tbb::detail::r1::wait_node<tbb::detail::r1::address_context>::wait_node(tbb::detail::r1::address_context) concurrent_monitor.h:106 (libtbb_debug.12.dylib:x86_64+0x2f88)
11:     #1 tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>::sleep_node(tbb::detail::r1::address_context) concurrent_monitor.h:146 (libtbb_debug.12.dylib:x86_64+0x2ebc)
11:     #2 tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>::sleep_node(tbb::detail::r1::address_context) concurrent_monitor.h:146 (libtbb_debug.12.dylib:x86_64+0x2268)
11:     #3 tbb::detail::r1::wait_on_address(void*, tbb::detail::d1::delegate_base&, unsigned long) address_waiter.cpp:70 (libtbb_debug.12.dylib:x86_64+0x1f96)
11:     #4 void tbb::detail::d1::adaptive_wait_on_address<tbb::detail::d1::rw_mutex::lock()::'lambda'()>(void*, tbb::detail::d1::rw_mutex::lock()::'lambda'(), unsigned long) _waitable_atomic.h:38 (libtbb_debug.12.dylib:x86_64+0x604c4)
11:     #5 tbb::detail::d1::rw_mutex::lock() rw_mutex.h:64 (libtbb_debug.12.dylib:x86_64+0x5ba71)
11:     #6 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::acquire(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:106 (libtbb_debug.12.dylib:x86_64+0x619c3)
11:     #7 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::rw_scoped_lock(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:86 (libtbb_debug.12.dylib:x86_64+0x618b7)
11:     #8 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::rw_scoped_lock(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:85 (libtbb_debug.12.dylib:x86_64+0x5a8ac)
11:     #9 tbb::detail::r1::market::adjust_demand(tbb::detail::r1::arena&, int, bool) market.cpp:517 (libtbb_debug.12.dylib:x86_64+0x5d09b)
11:     #10 tbb::detail::r1::arena::is_out_of_work() arena.cpp:320 (libtbb_debug.12.dylib:x86_64+0xd0de)
11:     #11 tbb::detail::r1::waiter_base::pause() waiters.h:36 (libtbb_debug.12.dylib:x86_64+0x26823)
11:     #12 tbb::detail::r1::outermost_worker_waiter::pause(tbb::detail::r1::arena_slot&) waiters.h:69 (libtbb_debug.12.dylib:x86_64+0x24014)
11:     #13 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool) task_dispatcher.h:231 (libtbb_debug.12.dylib:x86_64+0x27940)
11:     #14 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) task_dispatcher.h:343 (libtbb_debug.12.dylib:x86_64+0x1ee5f)
11:     #15 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) task_dispatcher.h:456 (libtbb_debug.12.dylib:x86_64+0xa2d9)
11:     #16 tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) arena.cpp:133 (libtbb_debug.12.dylib:x86_64+0x9500)
11:     #17 tbb::detail::r1::market::process(rml::job&) market.cpp:596 (libtbb_debug.12.dylib:x86_64+0x5dca9)
11:     #18 tbb::detail::r1::rml::private_worker::run() private_server.cpp:267 (libtbb_debug.12.dylib:x86_64+0x6e4d6)
11:     #19 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Previous read of size 8 at 0x7e80034a70d8 by thread T2:
11:     [failed to restore the stack]
11: 
11:   Location is stack of thread T11.
11: 
11:   Thread T11 (tid=28826241, running) created by thread T6 at:
11:     #0 pthread_create <null>:55209296 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2933d)
11:     #1 tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) rml_thread_monitor.h:208 (libtbb_debug.12.dylib:x86_64+0x72567)
11:     #2 tbb::detail::r1::rml::private_worker::wake_or_launch() private_server.cpp:299 (libtbb_debug.12.dylib:x86_64+0x70e9f)
11:     #3 tbb::detail::r1::rml::private_server::wake_some(int) private_server.cpp:397 (libtbb_debug.12.dylib:x86_64+0x70523)
11:     #4 tbb::detail::r1::rml::private_server::propagate_chain_reaction() private_server.cpp:159 (libtbb_debug.12.dylib:x86_64+0x6f200)
11:     #5 tbb::detail::r1::rml::private_worker::run() private_server.cpp:258 (libtbb_debug.12.dylib:x86_64+0x6e3f6)
11:     #6 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Thread T2 (tid=28826230, running) created by main thread at:
11:     #0 pthread_create <null>:55209296 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2933d)
11:     #1 tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) rml_thread_monitor.h:208 (libtbb_debug.12.dylib:x86_64+0x72567)
11:     #2 tbb::detail::r1::rml::private_worker::wake_or_launch() private_server.cpp:299 (libtbb_debug.12.dylib:x86_64+0x70e9f)
11:     #3 tbb::detail::r1::rml::private_server::wake_some(int) private_server.cpp:397 (libtbb_debug.12.dylib:x86_64+0x70523)
11:     #4 tbb::detail::r1::rml::private_server::adjust_job_count_estimate(int) private_server.cpp:408 (libtbb_debug.12.dylib:x86_64+0x710b0)
11:     #5 tbb::detail::r1::market::adjust_demand(tbb::detail::r1::arena&, int, bool) market.cpp:585 (libtbb_debug.12.dylib:x86_64+0x5d7f8)
11:     #6 void tbb::detail::r1::arena::advertise_new_work<(tbb::detail::r1::arena::new_work_type)0>() arena.h:540 (libtbb_debug.12.dylib:x86_64+0x83722)
11:     #7 tbb::detail::r1::spawn_and_notify(tbb::detail::d1::task&, tbb::detail::r1::arena_slot*, tbb::detail::r1::arena*) task_dispatcher.cpp:26 (libtbb_debug.12.dylib:x86_64+0x828e2)
11:     #8 tbb::detail::r1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:39 (libtbb_debug.12.dylib:x86_64+0x8285a)
11:     #9 tbb::detail::d1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) _task.h:187 (test_parallel_for:x86_64+0x1000344bd)
11:     #10 tbb::detail::d1::auto_partition_type::spawn_task(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) partitioner.h:504 (test_parallel_for:x86_64+0x1000343f4)
11:     #11 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::spawn_self(tbb::detail::d1::execution_data&) parallel_for.h:121 (test_parallel_for:x86_64+0x1000336a7)
11:     #12 void tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::offer_work_impl<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d0::split&>(tbb::detail::d1::execution_data&, tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&&&, tbb::detail::d0::split&&&) parallel_for.h:117 (test_parallel_for:x86_64+0x100033344)
11:     #13 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::offer_work(tbb::detail::d0::split&, tbb::detail::d1::execution_data&) parallel_for.h:99 (test_parallel_for:x86_64+0x100032bd9)
11:     #14 void tbb::detail::d1::partition_type_base<tbb::detail::d1::auto_partition_type>::execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) partitioner.h:284 (test_parallel_for:x86_64+0x100032511)
11:     #15 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) parallel_for.h:147 (test_parallel_for:x86_64+0x100031e6a)
11:     #16 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) task_dispatcher.h:315 (libtbb_debug.12.dylib:x86_64+0x8d752)
11:     #17 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) task_dispatcher.h:456 (libtbb_debug.12.dylib:x86_64+0x84079)
11:     #18 tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:168 (libtbb_debug.12.dylib:x86_64+0x839fb)
11:     #19 tbb::detail::r1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:121 (libtbb_debug.12.dylib:x86_64+0x8380f)
11:     #20 tbb::detail::d1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) _task.h:196 (test_parallel_for:x86_64+0x100031868)
11:     #21 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&, tbb::detail::d1::auto_partitioner const&, tbb::detail::d1::task_group_context&) parallel_for.h:89 (test_parallel_for:x86_64+0x100031575)
11:     #22 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&, tbb::detail::d1::auto_partitioner const&) parallel_for.h:78 (test_parallel_for:x86_64+0x10003133a)
11:     #23 void tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> > >(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&) parallel_for.h:205 (test_parallel_for:x86_64+0x100030bfd)
11:     #24 void TestVectorTypes<ClassWithVectorType<float vector[4]> >() test_parallel_for.cpp:76 (test_parallel_for:x86_64+0x100030502)
11:     #25 _DOCTEST_ANON_FUNC_84() test_parallel_for.cpp:316 (test_parallel_for:x86_64+0x10001d275)
11:     #26 doctest::Context::run() doctest.h:5882 (test_parallel_for:x86_64+0x100015653)
11:     #27 main doctest.h:5962 (test_parallel_for:x86_64+0x10001959d)
11: 
11: SUMMARY: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) concurrent_monitor.h:106 in tbb::detail::r1::wait_node<tbb::detail::r1::address_context>::wait_node(tbb::detail::r1::address_context)
11: ==================
11: ==================
11: WARNING: ThreadSanitizer: data race (pid=29364)
11:   Write of size 4 at 0x7e80034a7108 by thread T11:
11:     #0 tbb::detail::r1::binary_semaphore::binary_semaphore() semaphore.h:249 (libtbb_debug.12.dylib:x86_64+0x375c)
11:     #1 tbb::detail::r1::binary_semaphore::binary_semaphore() semaphore.h:249 (libtbb_debug.12.dylib:x86_64+0x36b8)
11:     #2 tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>::init() concurrent_monitor.h:160 (libtbb_debug.12.dylib:x86_64+0x30e6)
11:     #3 tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::address_context>::prepare_wait(tbb::detail::r1::wait_node<tbb::detail::r1::address_context>&) concurrent_monitor.h:201 (libtbb_debug.12.dylib:x86_64+0x5b73)
11:     #4 bool tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::address_context>::wait<tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>, tbb::detail::d1::delegate_base&>(tbb::detail::d1::delegate_base&&&, tbb::detail::r1::sleep_node<tbb::detail::r1::address_context>&&) concurrent_monitor.h:255 (libtbb_debug.12.dylib:x86_64+0x20e8)
11:     #5 tbb::detail::r1::wait_on_address(void*, tbb::detail::d1::delegate_base&, unsigned long) address_waiter.cpp:70 (libtbb_debug.12.dylib:x86_64+0x1fb2)
11:     #6 void tbb::detail::d1::adaptive_wait_on_address<tbb::detail::d1::rw_mutex::lock()::'lambda'()>(void*, tbb::detail::d1::rw_mutex::lock()::'lambda'(), unsigned long) _waitable_atomic.h:38 (libtbb_debug.12.dylib:x86_64+0x604c4)
11:     #7 tbb::detail::d1::rw_mutex::lock() rw_mutex.h:64 (libtbb_debug.12.dylib:x86_64+0x5ba71)
11:     #8 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::acquire(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:106 (libtbb_debug.12.dylib:x86_64+0x619c3)
11:     #9 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::rw_scoped_lock(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:86 (libtbb_debug.12.dylib:x86_64+0x618b7)
11:     #10 tbb::detail::d1::rw_scoped_lock<tbb::detail::d1::rw_mutex>::rw_scoped_lock(tbb::detail::d1::rw_mutex&, bool) _scoped_lock.h:85 (libtbb_debug.12.dylib:x86_64+0x5a8ac)
11:     #11 tbb::detail::r1::market::adjust_demand(tbb::detail::r1::arena&, int, bool) market.cpp:517 (libtbb_debug.12.dylib:x86_64+0x5d09b)
11:     #12 tbb::detail::r1::arena::is_out_of_work() arena.cpp:320 (libtbb_debug.12.dylib:x86_64+0xd0de)
11:     #13 tbb::detail::r1::waiter_base::pause() waiters.h:36 (libtbb_debug.12.dylib:x86_64+0x26823)
11:     #14 tbb::detail::r1::outermost_worker_waiter::pause(tbb::detail::r1::arena_slot&) waiters.h:69 (libtbb_debug.12.dylib:x86_64+0x24014)
11:     #15 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::r1::thread_data&, tbb::detail::r1::execution_data_ext&, tbb::detail::r1::outermost_worker_waiter&, long, bool, bool) task_dispatcher.h:231 (libtbb_debug.12.dylib:x86_64+0x27940)
11:     #16 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) task_dispatcher.h:343 (libtbb_debug.12.dylib:x86_64+0x1ee5f)
11:     #17 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter>(tbb::detail::d1::task*, tbb::detail::r1::outermost_worker_waiter&) task_dispatcher.h:456 (libtbb_debug.12.dylib:x86_64+0xa2d9)
11:     #18 tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) arena.cpp:133 (libtbb_debug.12.dylib:x86_64+0x9500)
11:     #19 tbb::detail::r1::market::process(rml::job&) market.cpp:596 (libtbb_debug.12.dylib:x86_64+0x5dca9)
11:     #20 tbb::detail::r1::rml::private_worker::run() private_server.cpp:267 (libtbb_debug.12.dylib:x86_64+0x6e4d6)
11:     #21 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Previous read of size 4 at 0x7e80034a7108 by thread T2:
11:     [failed to restore the stack]
11: 
11:   Location is stack of thread T11.
11: 
11:   Thread T11 (tid=28826241, running) created by thread T6 at:
11:     #0 pthread_create <null>:55209120 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2933d)
11:     #1 tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) rml_thread_monitor.h:208 (libtbb_debug.12.dylib:x86_64+0x72567)
11:     #2 tbb::detail::r1::rml::private_worker::wake_or_launch() private_server.cpp:299 (libtbb_debug.12.dylib:x86_64+0x70e9f)
11:     #3 tbb::detail::r1::rml::private_server::wake_some(int) private_server.cpp:397 (libtbb_debug.12.dylib:x86_64+0x70523)
11:     #4 tbb::detail::r1::rml::private_server::propagate_chain_reaction() private_server.cpp:159 (libtbb_debug.12.dylib:x86_64+0x6f200)
11:     #5 tbb::detail::r1::rml::private_worker::run() private_server.cpp:258 (libtbb_debug.12.dylib:x86_64+0x6e3f6)
11:     #6 tbb::detail::r1::rml::private_worker::thread_routine(void*) private_server.cpp:221 (libtbb_debug.12.dylib:x86_64+0x6e36d)
11: 
11:   Thread T2 (tid=28826230, running) created by main thread at:
11:     #0 pthread_create <null>:55209120 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2933d)
11:     #1 tbb::detail::r1::rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) rml_thread_monitor.h:208 (libtbb_debug.12.dylib:x86_64+0x72567)
11:     #2 tbb::detail::r1::rml::private_worker::wake_or_launch() private_server.cpp:299 (libtbb_debug.12.dylib:x86_64+0x70e9f)
11:     #3 tbb::detail::r1::rml::private_server::wake_some(int) private_server.cpp:397 (libtbb_debug.12.dylib:x86_64+0x70523)
11:     #4 tbb::detail::r1::rml::private_server::adjust_job_count_estimate(int) private_server.cpp:408 (libtbb_debug.12.dylib:x86_64+0x710b0)
11:     #5 tbb::detail::r1::market::adjust_demand(tbb::detail::r1::arena&, int, bool) market.cpp:585 (libtbb_debug.12.dylib:x86_64+0x5d7f8)
11:     #6 void tbb::detail::r1::arena::advertise_new_work<(tbb::detail::r1::arena::new_work_type)0>() arena.h:540 (libtbb_debug.12.dylib:x86_64+0x83722)
11:     #7 tbb::detail::r1::spawn_and_notify(tbb::detail::d1::task&, tbb::detail::r1::arena_slot*, tbb::detail::r1::arena*) task_dispatcher.cpp:26 (libtbb_debug.12.dylib:x86_64+0x828e2)
11:     #8 tbb::detail::r1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:39 (libtbb_debug.12.dylib:x86_64+0x8285a)
11:     #9 tbb::detail::d1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) _task.h:187 (test_parallel_for:x86_64+0x1000344bd)
11:     #10 tbb::detail::d1::auto_partition_type::spawn_task(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&) partitioner.h:504 (test_parallel_for:x86_64+0x1000343f4)
11:     #11 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::spawn_self(tbb::detail::d1::execution_data&) parallel_for.h:121 (test_parallel_for:x86_64+0x1000336a7)
11:     #12 void tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::offer_work_impl<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d0::split&>(tbb::detail::d1::execution_data&, tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&&&, tbb::detail::d0::split&&&) parallel_for.h:117 (test_parallel_for:x86_64+0x100033344)
11:     #13 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::offer_work(tbb::detail::d0::split&, tbb::detail::d1::execution_data&) parallel_for.h:99 (test_parallel_for:x86_64+0x100032bd9)
11:     #14 void tbb::detail::d1::partition_type_base<tbb::detail::d1::auto_partition_type>::execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) partitioner.h:284 (test_parallel_for:x86_64+0x100032511)
11:     #15 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) parallel_for.h:147 (test_parallel_for:x86_64+0x100031e6a)
11:     #16 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) task_dispatcher.h:315 (libtbb_debug.12.dylib:x86_64+0x8d752)
11:     #17 tbb::detail::d1::task* tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter>(tbb::detail::d1::task*, tbb::detail::r1::external_waiter&) task_dispatcher.h:456 (libtbb_debug.12.dylib:x86_64+0x84079)
11:     #18 tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:168 (libtbb_debug.12.dylib:x86_64+0x839fb)
11:     #19 tbb::detail::r1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) task_dispatcher.cpp:121 (libtbb_debug.12.dylib:x86_64+0x8380f)
11:     #20 tbb::detail::d1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) _task.h:196 (test_parallel_for:x86_64+0x100031868)
11:     #21 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&, tbb::detail::d1::auto_partitioner const&, tbb::detail::d1::task_group_context&) parallel_for.h:89 (test_parallel_for:x86_64+0x100031575)
11:     #22 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> >, tbb::detail::d1::auto_partitioner const>::run(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&, tbb::detail::d1::auto_partitioner const&) parallel_for.h:78 (test_parallel_for:x86_64+0x10003133a)
11:     #23 void tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<int>, SSE_Functor<ClassWithVectorType<float vector[4]> > >(tbb::detail::d1::blocked_range<int> const&, SSE_Functor<ClassWithVectorType<float vector[4]> > const&) parallel_for.h:205 (test_parallel_for:x86_64+0x100030bfd)
11:     #24 void TestVectorTypes<ClassWithVectorType<float vector[4]> >() test_parallel_for.cpp:76 (test_parallel_for:x86_64+0x100030502)
11:     #25 _DOCTEST_ANON_FUNC_84() test_parallel_for.cpp:316 (test_parallel_for:x86_64+0x10001d275)
11:     #26 doctest::Context::run() doctest.h:5882 (test_parallel_for:x86_64+0x100015653)
11:     #27 main doctest.h:5962 (test_parallel_for:x86_64+0x10001959d)
11: 
11: SUMMARY: ThreadSanitizer: data race semaphore.h:249 in tbb::detail::r1::binary_semaphore::binary_semaphore()
11: ==================
11: ThreadSanitizer: reported 4 warnings

seanm avatar Mar 09 '21 18:03 seanm

It seems like a new issue ; however, this feature is not released yet. If possible, can you please try the previous revision: https://github.com/oneapi-src/oneTBB/tree/b15aabb39a12a4dcddb6ee886265f2bf2f972650?

notify @pavelkumbrasev

alexey-katranov avatar Mar 12 '21 06:03 alexey-katranov

It turned out that Thread Sanitizer does not recognize synchronization via semaphores. The following code reports a race:

#include <thread>
#include <iostream>

#include <mach/semaphore.h>
#include <mach/task.h>
#include <mach/mach_init.h>
#include <mach/error.h>

class binary_semaphore {
public:
    binary_semaphore() : my_sem(0) {
        kern_return_t ret = semaphore_create( mach_task_self(), &my_sem, SYNC_POLICY_FIFO, 0 );
    }
    ~binary_semaphore() {
        kern_return_t ret = semaphore_destroy( mach_task_self(), my_sem );
    }
    //! wait/acquire
    void P() {
        int ret;
        do {
            ret = semaphore_wait( my_sem );
        } while( ret==KERN_ABORTED );
    }
    //! post/release
    void V() { semaphore_signal( my_sem ); }
private:
    semaphore_t my_sem;
};

int main() {
    int data{};
    binary_semaphore sem;
    std::thread thr([&sem, &data]{
        data = 1;                   // Thread Sanitizer reports a race over `data`
        sem.V();
    });
    sem.P();
    std::cout << "data = " << 
            data                    // Thread Sanitizer reports a race over `data`
            << "\n";
    thr.join();
    return 0;
}

The Thread Sanitizer output:

==================
WARNING: ThreadSanitizer: data race (pid=68470)
  Read of size 4 at 0x7ffee593abd8 by main thread:
    #0 main tsan_sem.cpp:39 (a.out:x86_64+0x1000012cd)

  Previous write of size 4 at 0x7ffee593abd8 by thread T1:
    #0 main::$_0::operator()() const tsan_sem.cpp:34 (a.out:x86_64+0x100002d5a)
    #1 decltype(std::__1::forward<main::$_0>(fp)()) std::__1::__invoke<main::$_0>(main::$_0&&) type_traits:4361 (a.out:x86_64+0x100002c60)
    #2 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_0>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_0>&, std::__1::__tuple_indices<>) thread:342 (a.out:x86_64+0x100002ac8)
    #3 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_0> >(void*) thread:352 (a.out:x86_64+0x100001c59)

  Location is stack of main thread.

  Thread T1 (tid=127433193, finished) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2a99d)
    #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:328 (a.out:x86_64+0x100001b9e)
    #2 std::__1::thread::thread<main::$_0, void>(main::$_0&&) thread:368 (a.out:x86_64+0x10000183e)
    #3 std::__1::thread::thread<main::$_0, void>(main::$_0&&) thread:360 (a.out:x86_64+0x100001448)
    #4 main tsan_sem.cpp:33 (a.out:x86_64+0x100001295)

SUMMARY: ThreadSanitizer: data race tsan_sem.cpp:39 in main
==================
data = 1
ThreadSanitizer: reported 1 warnings
Abort trap: 6

alexey-katranov avatar Mar 12 '21 08:03 alexey-katranov

I searched through the llvm and sanitizers bug databases, and not see anything about mach semaphores in there. I don't think those APIs are commonly used. So you believe that snippet to be correct, and a TSan limitation? If so, I can file a bug with them.

seanm avatar Mar 12 '21 21:03 seanm

I am also not an expert in mach semaphores but at first glance the example is correct (at least TBB is based on this implementation for many years). So, it makes sense to discuss the issue with Thread Sanitizer maintainers.

alexey-katranov avatar Mar 15 '21 12:03 alexey-katranov

@alexey-katranov OK, I created https://github.com/google/sanitizers/issues/1384, let's see if they have anything to say...

seanm avatar Mar 16 '21 20:03 seanm

@alexey-katranov meanwhile, it would be nice if TBB had an "official" suppression list of known issues.

My perspective is the following: I have my own code, with my many tests, and they all pass with TSan. Now I'm thinking of adding TBB as a dependency to one of my dependencies (ex: OpenCV) and now suddenly a bunch of my own tests fail with TSan due to the 'semi-false positives' in TBB. This dissuades me from using TBB.

seanm avatar Mar 17 '21 16:03 seanm

it would be nice if TBB had an "official" suppression list of known issues.

Yes, it makes sense.

alexey-katranov avatar Mar 18 '21 05:03 alexey-katranov

@seanm Wonder if you can share your current suppression list? Many thanks!

ProfFan avatar Nov 06 '21 13:11 ProfFan

@ProfFan I don't have one to share. I evaluated using TBB, but TSan is more important to me, so I just never started using TBB.

seanm avatar Nov 06 '21 15:11 seanm

@alexey-katranov Mach semaphore can be replaced with dispatch semaphore (https://developer.apple.com/documentation/dispatch/dispatch_semaphore).

I don't know how it affects to performance, but it will suppress false positive thread sanitizer errors.

phprus avatar Jan 07 '22 14:01 phprus

@alexey-katranov What's the status on OneTBB supporting TSan properly?

jdumas avatar Aug 01 '22 18:08 jdumas

@jdumas sorry for the long response. Which platforms are you interested in?

pavelkumbrasev avatar Sep 18 '22 08:09 pavelkumbrasev

Both Linux and macOS.

jdumas avatar Sep 18 '22 15:09 jdumas

On Linux we have 100% pass rate with ThreadSanitizer. However, on macOS we have issues mentioned above. With proposed by @phprus changes I have successfully ran basics test with no false positives on macOS. (I ran tests without tbbmalloc)

pavelkumbrasev avatar Sep 18 '22 17:09 pavelkumbrasev

Hmm I'm still observing a lot of TSan issues running tests in parallel with Catch2 and OneTBB. I'll try to provide a more concrete example in the upcoming weeks.

jdumas avatar Sep 21 '22 07:09 jdumas

@jdumas could you please share your status?

isaevil avatar Oct 05 '22 11:10 isaevil

@jdumas any updates?

isaevil avatar Oct 13 '22 08:10 isaevil

Sorry haven't gotten around to it yet. I'll try to share an example today or tomorrow.

jdumas avatar Oct 13 '22 14:10 jdumas

@jdumas Have you had a chance to take care of this example?

isaevil avatar Oct 25 '22 09:10 isaevil

Hi, sorry I haven't been able to build a MWE yet. I have setup a github action on my repository (here) that shows a bunch of TSan issues. At this point I am still working through the code to determine if it's a data within my code (I've already fixed a few of those) or an issue with OneTBB (which I have not been able to isolate separately on Linux yet).

jdumas avatar Nov 02 '22 20:11 jdumas

Closing this issue because Mac TSan supported via PR#911. @jdumas Please feel free to reopen it or create new one once you will get failure logs.

pavelkumbrasev avatar Dec 06 '22 08:12 pavelkumbrasev

Hi,

So I'm hitting a TSan issue on Ubuntu with LLVM 14 with tbbmalloc enabled on a simple parallel_for test. The detailed build log is available here, and the test code is here. Is is expected that this simple parallel_for with tbbmalloc produces a hit with TSan?

jdumas avatar Apr 06 '23 22:04 jdumas

Answering my own question: it seems the error go away if I set TBB_SANITIZE to thread, which causes __TBB_USE_ITT_NOTIFY to NOT be defined.

jdumas avatar Apr 07 '23 01:04 jdumas