Test ``test_global_control`` hang on release and failed on debug
A new bug was discovered after applying PR #739.
Commit: 0ef804884856e791fe7bb173078a92daf3929f76
Release build with LTO:
phprus@mbp release % ps ax | grep test
31860 s011 S+ 0:04.12 ./appleclang_13.0_cxx17_64_release/test_global_control
31876 s014 R+ 0:00.00 grep test
phprus@mbp release % lldb --attach-pid 31860
(lldb) process attach --pid 31860
Process 31860 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x00000001b079370c libsystem_kernel.dylib`__ulock_wait + 8
libsystem_kernel.dylib`__ulock_wait:
-> 0x1b079370c <+8>: b.lo 0x1b079372c ; <+40>
0x1b0793710 <+12>: pacibsp
0x1b0793714 <+16>: stp x29, x30, [sp, #-0x10]!
0x1b0793718 <+20>: mov x29, sp
Target 0: (test_global_control) stopped.
Executable module set to "/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/build/release/appleclang_13.0_cxx17_64_release/test_global_control".
Architecture set to: arm64e-apple-macosx-.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00000001b079370c libsystem_kernel.dylib`__ulock_wait + 8
frame #1: 0x00000001b07cf574 libsystem_pthread.dylib`_pthread_join + 452
frame #2: 0x00000001b072873c libc++.1.dylib`std::__1::thread::join() + 36
frame #3: 0x000000010290632c test_global_control`DOCTEST_ANON_FUNC_51() + 108
frame #4: 0x0000000102903694 test_global_control`main + 11016
frame #5: 0x0000000102ced0f4 dyld`start + 520
(lldb) thread select 2
* thread #2
frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
-> 0x1b0794ebc <+8>: b.lo 0x1b0794edc ; <+40>
0x1b0794ec0 <+12>: pacibsp
0x1b0794ec4 <+16>: stp x29, x30, [sp, #-0x10]!
0x1b0794ec8 <+20>: mov x29, sp
(lldb) bt
* thread #2
* frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
frame #5: 0x0000000102a9f104 libtbb.12.6.dylib`tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) + 1028
frame #6: 0x000000010291915c test_global_control`tbb::detail::d1::task_arena_function<TestBlockingTerminateNS::ExceptionTest2::operator()()::'lambda'(), void>::operator()() const + 276
frame #7: 0x0000000102a8f458 libtbb.12.6.dylib`tbb::detail::r1::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) + 312
frame #8: 0x0000000102909344 test_global_control`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void utils::NativeParallelFor<int, DOCTEST_ANON_FUNC_51()::$_4>(int, DOCTEST_ANON_FUNC_51()::$_4 const&)::'lambda'()> >(void*) + 432
frame #9: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 3
* thread #3
frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
libsystem_kernel.dylib`semaphore_wait_trap:
-> 0x1b0791990 <+8>: ret
libsystem_kernel.dylib`semaphore_wait_signal_trap:
0x1b0791994 <+0>: mov x16, #-0x25
0x1b0791998 <+4>: svc #0x80
0x1b079199c <+8>: ret
(lldb) bt
* thread #3
* frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
frame #1: 0x0000000102a9c0fc libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 680
frame #2: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 4
* thread #4
frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
libsystem_kernel.dylib`semaphore_wait_trap:
-> 0x1b0791990 <+8>: ret
libsystem_kernel.dylib`semaphore_wait_signal_trap:
0x1b0791994 <+0>: mov x16, #-0x25
0x1b0791998 <+4>: svc #0x80
0x1b079199c <+8>: ret
(lldb) bt
* thread #4
* frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
frame #1: 0x0000000102a9c0fc libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 680
frame #2: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 5
* thread #5
frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
-> 0x1b0794ebc <+8>: b.lo 0x1b0794edc ; <+40>
0x1b0794ec0 <+12>: pacibsp
0x1b0794ec4 <+16>: stp x29, x30, [sp, #-0x10]!
0x1b0794ec8 <+20>: mov x29, sp
(lldb) bt
* thread #5
* frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
frame #5: 0x0000000102a8d56c libtbb.12.6.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1444
frame #6: 0x0000000102a99a9c libtbb.12.6.dylib`tbb::detail::r1::market::process(rml::job&) + 52
frame #7: 0x0000000102a9bf60 libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 268
frame #8: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 6
* thread #6
frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
-> 0x1b0794ebc <+8>: b.lo 0x1b0794edc ; <+40>
0x1b0794ec0 <+12>: pacibsp
0x1b0794ec4 <+16>: stp x29, x30, [sp, #-0x10]!
0x1b0794ec8 <+20>: mov x29, sp
(lldb) bt
* thread #6
* frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
frame #5: 0x0000000102a8d56c libtbb.12.6.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1444
frame #6: 0x0000000102a99a9c libtbb.12.6.dylib`tbb::detail::r1::market::process(rml::job&) + 52
frame #7: 0x0000000102a9bf60 libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 268
frame #8: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb)
Output:
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
^C===============================================================================
/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:239:
TEST CASE: prolong lifetime advanced
/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:239: FATAL ERROR: test case CRASHED: SIGINT - Terminal interrupt signal
===============================================================================
[doctest] test cases: 3 | 2 passed | 1 failed | 1 skipped
[doctest] assertions: 9 | 9 passed | 0 failed |
[doctest] Status: FAILURE!
Debug build:
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
Assertion pred() failed (located in the enforce function, line in file: 173)
===============================================================================
/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:250:
TEST CASE: prolong lifetime multiple wait
/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:250: FATAL ERROR: test case CRASHED: SIGABRT - Abort (abnormal termination) signal
===============================================================================
[doctest] test cases: 4 | 3 passed | 1 failed | 0 skipped
[doctest] assertions: 58 | 58 passed | 0 failed |
[doctest] Status: FAILURE!
zsh: abort ./appleclang_13.0_cxx17_64_debug/test_global_control
Assert: https://github.com/oneapi-src/oneTBB/blob/0ef804884856e791fe7bb173078a92daf3929f76/src/tbb/market.h#L171-L174
See: https://github.com/oneapi-src/oneTBB/pull/739#issuecomment-1016508781 https://github.com/oneapi-src/oneTBB/pull/739#issuecomment-1016771535
Commit: cd6a5f9f4a5bae9fc157fa03093c17b9f861c9f2 The behavior doesn't change.
The behavior doesn't change.
Yes, it seems another issue. The observed assert is about relation between market and arena while #739 fixes an issue in private_server (thread pool).
Any news?
It seems the testing approach is broken on ARM, any test using utils::SpinBarrier might hang without real issue. We are thinking about better approach for testing.
Is this a bug in the tests, not in oneTBB library? This is good news for me!
Currently, it is difficult to say that observed anomalies are fully unrelated to oneTBB library. The deadlocks are mostly related to the testing approach but the assertion seems to be of a different nature.
@phprus , can you please check that test does not fail in debug?
Reopening the issue because it is not fixed in release yet
@alexey-katranov
Debug build works fine:
phprus@mbp debug % ./appleclang_13.0_cxx17_64_debug/test_global_control
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
===============================================================================
[doctest] test cases: 4 | 4 passed | 0 failed | 0 skipped
[doctest] assertions: 86 | 86 passed | 0 failed |
[doctest] Status: SUCCESS!
Release build - hangs.
Thank you for the confirmation. As for release, we have some experiments in dev/alexey-katranov/exp-seq-cst but it is not clear if we go with such approach because it might affect performance.
@alexey-katranov
Commit: 29fb19bf041ebdc4a0ec842700c000ba913dff96 (https://github.com/oneapi-src/oneTBB/tree/dev/alexey-katranov/exp-seq-cst)
test_global_control and test_arena_constraints (https://github.com/oneapi-src/oneTBB/issues/756) works without hangs.
test_global_controlandtest_arena_constraints(https://github.com/oneapi-src/oneTBB/issues/756) works without hangs.
Sounds great. Unfortunately, conformance_parallel_pipeline still hangs after ~859800 runs...
I limited the total run times to 15 minutes...
Is there any news on this issue?