oneTBB icon indicating copy to clipboard operation
oneTBB copied to clipboard

Test ``test_global_control`` hang on release and failed on debug

Open phprus opened this issue 3 years ago • 13 comments

A new bug was discovered after applying PR #739.

Commit: 0ef804884856e791fe7bb173078a92daf3929f76

Release build with LTO:

phprus@mbp release % ps ax | grep test
31860 s011  S+     0:04.12 ./appleclang_13.0_cxx17_64_release/test_global_control
31876 s014  R+     0:00.00 grep test
phprus@mbp release % lldb --attach-pid 31860
(lldb) process attach --pid 31860
Process 31860 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00000001b079370c libsystem_kernel.dylib`__ulock_wait + 8
libsystem_kernel.dylib`__ulock_wait:
->  0x1b079370c <+8>:  b.lo   0x1b079372c               ; <+40>
    0x1b0793710 <+12>: pacibsp
    0x1b0793714 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1b0793718 <+20>: mov    x29, sp
Target 0: (test_global_control) stopped.

Executable module set to "/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/build/release/appleclang_13.0_cxx17_64_release/test_global_control".
Architecture set to: arm64e-apple-macosx-.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001b079370c libsystem_kernel.dylib`__ulock_wait + 8
    frame #1: 0x00000001b07cf574 libsystem_pthread.dylib`_pthread_join + 452
    frame #2: 0x00000001b072873c libc++.1.dylib`std::__1::thread::join() + 36
    frame #3: 0x000000010290632c test_global_control`DOCTEST_ANON_FUNC_51() + 108
    frame #4: 0x0000000102903694 test_global_control`main + 11016
    frame #5: 0x0000000102ced0f4 dyld`start + 520
(lldb) thread select 2
* thread #2
    frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
->  0x1b0794ebc <+8>:  b.lo   0x1b0794edc               ; <+40>
    0x1b0794ec0 <+12>: pacibsp
    0x1b0794ec4 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1b0794ec8 <+20>: mov    x29, sp
(lldb) bt
* thread #2
  * frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
    frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
    frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
    frame #5: 0x0000000102a9f104 libtbb.12.6.dylib`tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) + 1028
    frame #6: 0x000000010291915c test_global_control`tbb::detail::d1::task_arena_function<TestBlockingTerminateNS::ExceptionTest2::operator()()::'lambda'(), void>::operator()() const + 276
    frame #7: 0x0000000102a8f458 libtbb.12.6.dylib`tbb::detail::r1::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) + 312
    frame #8: 0x0000000102909344 test_global_control`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void utils::NativeParallelFor<int, DOCTEST_ANON_FUNC_51()::$_4>(int, DOCTEST_ANON_FUNC_51()::$_4 const&)::'lambda'()> >(void*) + 432
    frame #9: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 3
* thread #3
    frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
libsystem_kernel.dylib`semaphore_wait_trap:
->  0x1b0791990 <+8>: ret

libsystem_kernel.dylib`semaphore_wait_signal_trap:
    0x1b0791994 <+0>: mov    x16, #-0x25
    0x1b0791998 <+4>: svc    #0x80
    0x1b079199c <+8>: ret
(lldb) bt
* thread #3
  * frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000102a9c0fc libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 680
    frame #2: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 4
* thread #4
    frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
libsystem_kernel.dylib`semaphore_wait_trap:
->  0x1b0791990 <+8>: ret

libsystem_kernel.dylib`semaphore_wait_signal_trap:
    0x1b0791994 <+0>: mov    x16, #-0x25
    0x1b0791998 <+4>: svc    #0x80
    0x1b079199c <+8>: ret
(lldb) bt
* thread #4
  * frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000102a9c0fc libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 680
    frame #2: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 5
* thread #5
    frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
->  0x1b0794ebc <+8>:  b.lo   0x1b0794edc               ; <+40>
    0x1b0794ec0 <+12>: pacibsp
    0x1b0794ec4 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1b0794ec8 <+20>: mov    x29, sp
(lldb) bt
* thread #5
  * frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
    frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
    frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
    frame #5: 0x0000000102a8d56c libtbb.12.6.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1444
    frame #6: 0x0000000102a99a9c libtbb.12.6.dylib`tbb::detail::r1::market::process(rml::job&) + 52
    frame #7: 0x0000000102a9bf60 libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 268
    frame #8: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 6
* thread #6
    frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
->  0x1b0794ebc <+8>:  b.lo   0x1b0794edc               ; <+40>
    0x1b0794ec0 <+12>: pacibsp
    0x1b0794ec4 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1b0794ec8 <+20>: mov    x29, sp
(lldb) bt
* thread #6
  * frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
    frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
    frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
    frame #5: 0x0000000102a8d56c libtbb.12.6.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1444
    frame #6: 0x0000000102a99a9c libtbb.12.6.dylib`tbb::detail::r1::market::process(rml::job&) + 52
    frame #7: 0x0000000102a9bf60 libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 268
    frame #8: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb)

Output:

[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options










^C===============================================================================
/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:239:
TEST CASE:  prolong lifetime advanced

/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:239: FATAL ERROR: test case CRASHED: SIGINT - Terminal interrupt signal

===============================================================================
[doctest] test cases: 3 | 2 passed | 1 failed | 1 skipped
[doctest] assertions: 9 | 9 passed | 0 failed |
[doctest] Status: FAILURE!

Debug build:

[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
Assertion pred() failed (located in the enforce function, line in file: 173)
===============================================================================
/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:250:
TEST CASE:  prolong lifetime multiple wait

/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:250: FATAL ERROR: test case CRASHED: SIGABRT - Abort (abnormal termination) signal

===============================================================================
[doctest] test cases:  4 |  3 passed | 1 failed | 0 skipped
[doctest] assertions: 58 | 58 passed | 0 failed |
[doctest] Status: FAILURE!
zsh: abort      ./appleclang_13.0_cxx17_64_debug/test_global_control

Assert: https://github.com/oneapi-src/oneTBB/blob/0ef804884856e791fe7bb173078a92daf3929f76/src/tbb/market.h#L171-L174

See: https://github.com/oneapi-src/oneTBB/pull/739#issuecomment-1016508781 https://github.com/oneapi-src/oneTBB/pull/739#issuecomment-1016771535

phprus avatar Jan 21 '22 12:01 phprus

Commit: cd6a5f9f4a5bae9fc157fa03093c17b9f861c9f2 The behavior doesn't change.

phprus avatar Jan 27 '22 17:01 phprus

The behavior doesn't change.

Yes, it seems another issue. The observed assert is about relation between market and arena while #739 fixes an issue in private_server (thread pool).

alex-katranov avatar Jan 27 '22 17:01 alex-katranov

Any news?

phprus avatar Feb 16 '22 16:02 phprus

It seems the testing approach is broken on ARM, any test using utils::SpinBarrier might hang without real issue. We are thinking about better approach for testing.

alex-katranov avatar Feb 16 '22 16:02 alex-katranov

Is this a bug in the tests, not in oneTBB library? This is good news for me!

phprus avatar Feb 16 '22 17:02 phprus

Currently, it is difficult to say that observed anomalies are fully unrelated to oneTBB library. The deadlocks are mostly related to the testing approach but the assertion seems to be of a different nature.

alex-katranov avatar Feb 16 '22 20:02 alex-katranov

@phprus , can you please check that test does not fail in debug?

Reopening the issue because it is not fixed in release yet

alex-katranov avatar Feb 21 '22 08:02 alex-katranov

@alexey-katranov

Debug build works fine:

phprus@mbp debug % ./appleclang_13.0_cxx17_64_debug/test_global_control
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
===============================================================================
[doctest] test cases:  4 |  4 passed | 0 failed | 0 skipped
[doctest] assertions: 86 | 86 passed | 0 failed |
[doctest] Status: SUCCESS!

Release build - hangs.

phprus avatar Feb 21 '22 13:02 phprus

Thank you for the confirmation. As for release, we have some experiments in dev/alexey-katranov/exp-seq-cst but it is not clear if we go with such approach because it might affect performance.

alex-katranov avatar Feb 21 '22 13:02 alex-katranov

@alexey-katranov

Commit: 29fb19bf041ebdc4a0ec842700c000ba913dff96 (https://github.com/oneapi-src/oneTBB/tree/dev/alexey-katranov/exp-seq-cst)

test_global_control and test_arena_constraints (https://github.com/oneapi-src/oneTBB/issues/756) works without hangs.

phprus avatar Feb 21 '22 14:02 phprus

test_global_control and test_arena_constraints (https://github.com/oneapi-src/oneTBB/issues/756) works without hangs.

Sounds great. Unfortunately, conformance_parallel_pipeline still hangs after ~859800 runs...

alex-katranov avatar Feb 21 '22 14:02 alex-katranov

I limited the total run times to 15 minutes...

phprus avatar Feb 21 '22 14:02 phprus

Is there any news on this issue?

phprus avatar Jun 07 '22 13:06 phprus