oneTBB icon indicating copy to clipboard operation
oneTBB copied to clipboard

Failing tests and deadlocks on linux/aarch64 (Amazon Graviton2)

Open ulfworsoe opened this issue 4 years ago • 14 comments

Found on:

  • Linux (Ubuntu 20.04.3) on aarch64 (Amazon Graviton2)
  • TBB version 2021.4.0 and master (883c2e5245c39624b3b5d6d56d5b203cf09eac38)
  • building with cmake+ninja (CC=clang CXX=clang++ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$HOME/local/2021.4.0 -DCMAKE_BUILD_TYPE=Release).

Running the TBB tests, some tests failed:

  • In master and 2021.4.0 test_composite_node appear to hang, killed manually after ~1000 seconds
  • In 2021.4.0 test_concurrent_vector appear to hang, killed manually after ~1000 seconds
  • test_eh_thead aborts

ulfworsoe avatar Dec 16 '21 08:12 ulfworsoe

More info in the duplicate but closed issue: https://github.com/oneapi-src/oneTBB/issues/688

erling-d-andersen avatar Dec 16 '21 09:12 erling-d-andersen

@anton-potapov

I have been working on installing and testing this package for amd64 and arm64 architectures. While testing I am getting 2 tests failure on my local arm server. It would be really helpful if you could share some pointers on it.

98% tests passed; 2 tests failed out of 133 
Total Test time (real) = 476.32 sec 
The following tests FAILED: 
         28 - test_resumable_tasks (SEGFAULT) 
         60 - test_eh_thread (Subprocess aborted) 
Errors while running CTest

Log for reference: oneTBB_tests_arm64.txt

odidev avatar Dec 16 '21 11:12 odidev

test_eh_thread seems stange. It could not created 1024 and received an error from pthread_create but then oneTBB was able to create one more thread. Try to set a break point to pthread_create inside rml_thread_monitor.h and check why it does not fail.

As for other tests, it is quite difficult to suppose what is going wrong (supposedly, it might be related to relaxed memory model of aarch64). Is possible to share core dumps of hanged tests?

alex-katranov avatar Dec 20 '21 11:12 alex-katranov

Due to Christmas vacation we are not likely to return before January. Sorry.

erling-d-andersen avatar Dec 21 '21 10:12 erling-d-andersen

@alexey-katranov, @anton-potapov Error in test_eh_thread - is not oneTBB issue.

Hardware: Raspberry Pi 4 8GB RAM. OS:

  1. Ubuntu 20.04.3 (kernel 5.4.0-1047-raspi) + docker, container: oraclelinux:8.5
  2. Ubuntu 20.04.3 (kernel 5.4.0-1047-raspi) without any containers.

Test:

// g++ -pthread -std=c++17 pthread.cpp && ./a.out && echo OK || echo ERROR

#include <algorithm>
#include <atomic>
#include <condition_variable>
#include <thread>
#include <vector>
#include <iostream>

#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>


void limitThreads(size_t limit)
{
    rlimit rlim;

    int ret = getrlimit(RLIMIT_NPROC, &rlim);
    if (ret != 0)
    {
        std::cerr << "getrlimit has returned an error" << std::endl;
        exit(1);
    }

    rlim.rlim_cur = (rlim.rlim_max == (rlim_t)RLIM_INFINITY) ? limit : std::min(limit, rlim.rlim_max);

    ret = setrlimit(RLIMIT_NPROC, &rlim);
    if (ret != 0)
    {
        std::cerr << "setrlimit has returned an error" << std::endl;
        exit(1);
    }
}

static std::mutex m;
static std::condition_variable cv;
static std::atomic<bool> stop{ false };

static void* thread_routine(void*)
{
    std::unique_lock<std::mutex> lock(m);
    cv.wait(lock, [] { return stop == true; });
    return 0;
}

static void* new_thread_routine(void*)
{
    std::cerr << "sleep" << std::endl;
    sleep(60);
    return 0;
}

class Thread {
    pthread_t mHandle{};
    bool mValid{};
public:
    Thread() {
        mValid = false;
        pthread_attr_t attr;
        // Limit the stack size not to consume all virtual memory on 32 bit platforms.
        if (pthread_attr_init(&attr) == 0 && pthread_attr_setstacksize(&attr, 100*1024) == 0) {
            mValid = pthread_create(&mHandle, &attr, thread_routine, /* arg = */ nullptr) == 0;
        }
    }
    bool isValid() const { return mValid; }
    void join() {
        pthread_join(mHandle, nullptr);
    }
};


void check( int error_code, const char* routine )
{
    if (error_code)
    {
        std::cerr << routine << std::endl;
        _exit(1);
    }
}


int main()
{
    // Some systems set really big limit (e.g. >45К) for the number of processes/threads
    limitThreads(1024);

    std::thread /* isolate test */ ([] {
        std::vector<Thread> threads;
        stop = false;
        auto finalize = [&] {
            stop = true;
            cv.notify_all();
            for (auto& t : threads) {
                t.join();
            }
        };

        for (int i = 0;; ++i) {
            Thread thread;
            if (!thread.isValid()) {
                break;
            }
            threads.push_back(thread);
            if (i == 1024) {
                std::cerr << "setrlimit seems having no effect" << std::endl;
                finalize();
                return;
            }
        }

        pthread_t new_handle;
        pthread_attr_t s;
        check(pthread_attr_init( &s ), "pthread_attr_init has failed");
        pthread_t handle;
        void * arg = nullptr;

        int ec = pthread_create( &new_handle, &s, new_thread_routine, arg );
        if (ec) {
            std::cerr << "EXPECTED ERROR: " << "pthread_create has failed" << std::endl;
        } else {
            std::cerr << "UNEXPECTED OK: " << "pthread_create is not failed" << std::endl;
            _exit(1);
        }
        check( pthread_attr_destroy( &s ), "pthread_attr_destroy has failed" );
        pthread_join(new_handle, nullptr);

        finalize();
    }).join();

    return 0;
}

Output:

[root@ubuntu ~]# g++ -pthread -std=c++17 pthread.cpp
[root@ubuntu ~]# ./a.out && echo OK || echo ERROR
sleepUNEXPECTED OK: pthread_create is not failed
ERROR

Maybe bug in glibc or in ubuntu 20.04 kernel? ... Or in my test?

phprus avatar Dec 22 '21 14:12 phprus

Maybe if (pthread_attr_init(&attr) == 0 && pthread_attr_setstacksize(&attr, 100*1024) == 0) failed (not pthread_create) when we tried to create 1024 threads?

alex-katranov avatar Dec 22 '21 17:12 alex-katranov

@alexey-katranov YES!!! pthread_attr_setstacksize returns EINVAL.

https://man7.org/linux/man-pages/man3/pthread_attr_setstacksize.3.html:

EINVAL The stack size is less than PTHREAD_STACK_MIN bytes. On some systems, pthread_attr_setstacksize() can fail with the error EINVAL if stacksize is not a multiple of the system page size.

PTHREAD_STACK_MIN is present in <limits.h>

On aarch64 PTHREAD_STACK_MIN == 131072 or dynamic value: sysconf(_SC_THREAD_STACK_MIN).

phprus avatar Dec 22 '21 18:12 phprus

@isaevil PR #697 fix only test_eh_thead test, but tests test_composite_node, test_concurrent_vector and test_resumable_tasks is not fixed by it. Please, reopen this issue.

phprus avatar Dec 23 '21 14:12 phprus

@phprus I guess issue was closed automatically because you mentioned it. Reopening.

isaevil avatar Dec 23 '21 14:12 isaevil

test_resumable_tasks on Raspberry Pi 4.

Release build:

[root@ubuntu gcc85-release]# ./gnu_8.5_cxx17_64_release/test_resumable_tasks
[doctest] doctest version is "2.4.6"
[doctest] run with "--help" for options
===============================================================================
/root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/test/tbb/test_resumable_tasks.cpp:431:
TEST CASE:  Nested arena

/root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/test/tbb/test_resumable_tasks.cpp:431: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
[doctest] test cases:       2 |       0 passed | 2 failed | 3 skipped
[doctest] assertions: 7000011 | 7000011 passed | 0 failed |
[doctest] Status: FAILURE!
Segmentation fault (core dumped)

[root@ubuntu gcc85-release]# gdb ./gnu_8.5_cxx17_64_release/test_resumable_tasks core
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-16.0.4.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./gnu_8.5_cxx17_64_release/test_resumable_tasks...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 105481]
[New LWP 105488]
[New LWP 105476]
[New LWP 105487]
[New LWP 105489]
[New LWP 105490]
[New LWP 105482]
[New LWP 105484]
[New LWP 105486]
[New LWP 105485]
[New LWP 105483]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./gnu_8.5_cxx17_64_release/test_resumable_tasks'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000042ebfc in void tbb::detail::d1::dynamic_grainsize_mode<tbb::detail::d1::adaptive_mode<tbb::detail::d1::auto_partition_type> >::work_balance<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<OutermostArenaBody, int>, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<OutermostArenaBody, int>, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) ()
[Current thread is 1 (Thread 0xffff88662010 (LWP 105481))]
(gdb) bt
#0  0x000000000042ebfc in void tbb::detail::d1::dynamic_grainsize_mode<tbb::detail::d1::adaptive_mode<tbb::detail::d1::auto_partition_type> >::work_balance<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<OutermostArenaBody, int>, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<OutermostArenaBody, int>, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) ()
#1  0x000000000042f36c in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<OutermostArenaBody, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) ()
#2  0x0000ffff8b17d260 in tbb::detail::r1::co_local_wait_for_all(unsigned int, unsigned int) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#3  0x0000ffff8ad4d140 in ?? () at ../sysdeps/unix/sysv/linux/aarch64/setcontext.S:123 from /lib64/libc.so.6
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) thread 2
[Switching to thread 2 (Thread 0xffff83e45010 (LWP 105488))]
#0  0x0000ffff8add95f8 in mprotect () at ../sysdeps/unix/syscall-template.S:78
78      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x0000ffff8add95f8 in mprotect () at ../sysdeps/unix/syscall-template.S:78
#1  0x0000ffff8b173d50 in tbb::detail::r1::create_coroutine(tbb::detail::r1::coroutine_type&, unsigned long, void*) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#2  0x0000ffff8b1790c8 in tbb::detail::r1::suspend(void (*)(void*, tbb::detail::r1::suspend_point_type*), void*) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#3  0x000000000042db2c in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostInnerParFor, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) ()
#4  0x0000ffff8b17d260 in tbb::detail::r1::co_local_wait_for_all(unsigned int, unsigned int) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#5  0x0000ffff8ad4d140 in ?? () at ../sysdeps/unix/sysv/linux/aarch64/setcontext.S:123 from /lib64/libc.so.6
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) thread 3
[Switching to thread 3 (Thread 0xffff8b1f3010 (LWP 105476))]
#0  __GI___mmap64 (offset=<optimized out>, fd=-1, flags=131106, prot=0, len=139264, addr=<optimized out>) at ../sysdeps/unix/sysv/linux/mmap64.c:52
52        return (void *) MMAP_CALL (mmap, addr, len, prot, flags, fd, offset);
(gdb) bt
#0  __GI___mmap64 (offset=<optimized out>, fd=-1, flags=131106, prot=0, len=139264, addr=<optimized out>) at ../sysdeps/unix/sysv/linux/mmap64.c:52
#1  __GI___mmap64 (addr=<optimized out>, len=139264, prot=0, flags=131106, fd=-1, offset=0) at ../sysdeps/unix/sysv/linux/mmap64.c:40
#2  0x0000ffff8b173d3c in tbb::detail::r1::create_coroutine(tbb::detail::r1::coroutine_type&, unsigned long, void*) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#3  0x0000ffff8b1790c8 in tbb::detail::r1::suspend(void (*)(void*, tbb::detail::r1::suspend_point_type*), void*) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#4  0x000000000042e440 in void tbb::detail::d1::dynamic_grainsize_mode<tbb::detail::d1::adaptive_mode<tbb::detail::d1::auto_partition_type> >::work_balance<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostOuterParFor, int>, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostOuterParFor, int>, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) ()
#5  0x000000000042e814 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostOuterParFor, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) ()
#6  0x0000ffff8b17d260 in tbb::detail::r1::co_local_wait_for_all(unsigned int, unsigned int) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#7  0x0000ffff8ad4d140 in ?? () at ../sysdeps/unix/sysv/linux/aarch64/setcontext.S:123 from /lib64/libc.so.6
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) thread 4
[Switching to thread 4 (Thread 0xffff8aa6a010 (LWP 105487))]
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0xffffef716e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88        int err = lll_futex_timed_wait (futex_word, expected, NULL, private);
(gdb) bt
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0xffffef716e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0xffffef716da8, cond=0xffffef716dd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0xffffef716dd8, mutex=0xffffef716da8) at pthread_cond_wait.c:655
#3  0x0000ffff8b05d738 in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>)
    at /usr/src/debug/gcc-8.5.0-4.0.1.el8_5.aarch64/obj-aarch64-redhat-linux/aarch64-redhat-linux/libstdc++-v3/include/aarch64-redhat-linux/bits/gthr-default.h:864
#4  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#5  0x00000000004277dc in AsyncActivity::asyncLoop(AsyncActivity*) ()
#6  0x0000ffff8b063cf4 in std::execute_native_thread_routine (__p=0x29af1fd0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#7  0x0000ffff8ae878f8 in start_thread (arg=0xffff8b063cd8 <std::execute_native_thread_routine(void*)>) at pthread_create.c:479
#8  0x0000ffff8addd1fc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) thread 5
[Switching to thread 5 (Thread 0xffff83dbe010 (LWP 105489))]
#0  0x0000ffff8add95f8 in mprotect () at ../sysdeps/unix/syscall-template.S:78
78      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x0000ffff8add95f8 in mprotect () at ../sysdeps/unix/syscall-template.S:78
#1  0x0000ffff8b173d50 in tbb::detail::r1::create_coroutine(tbb::detail::r1::coroutine_type&, unsigned long, void*) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#2  0x0000ffff8b1790c8 in tbb::detail::r1::suspend(void (*)(void*, tbb::detail::r1::suspend_point_type*), void*) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#3  0x000000000042db2c in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostInnerParFor, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) ()
#4  0x0000ffff8b17d260 in tbb::detail::r1::co_local_wait_for_all(unsigned int, unsigned int) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#5  0x0000ffff8ad4d140 in ?? () at ../sysdeps/unix/sysv/linux/aarch64/setcontext.S:123 from /lib64/libc.so.6
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) thread 6
[Switching to thread 6 (Thread 0xffff83be3010 (LWP 105490))]
#0  0x0000ffff8b17c980 in tbb::detail::r1::co_local_wait_for_all(unsigned int, unsigned int) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
(gdb) bt
#0  0x0000ffff8b17c980 in tbb::detail::r1::co_local_wait_for_all(unsigned int, unsigned int) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#1  0x0000ffff8ad4d140 in ?? () at ../sysdeps/unix/sysv/linux/aarch64/setcontext.S:123 from /lib64/libc.so.6
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) thread 7
[Switching to thread 7 (Thread 0xffff88641010 (LWP 105482))]
#0  swapcontext () at ../sysdeps/unix/sysv/linux/aarch64/swapcontext.S:90
90              cbnz    x0, 1f
(gdb) bt
#0  swapcontext () at ../sysdeps/unix/sysv/linux/aarch64/swapcontext.S:90
#1  0x0000ffff8b178e24 in tbb::detail::r1::task_dispatcher::resume(tbb::detail::r1::task_dispatcher&) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#2  0x0000ffff8b178ef4 in tbb::detail::r1::suspend(void (*)(void*, tbb::detail::r1::suspend_point_type*), void*) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#3  0x000000000042db2c in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostInnerParFor, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) ()
#4  0x0000ffff8b17c368 in tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#5  0x000000000042e504 in void tbb::detail::d1::dynamic_grainsize_mode<tbb::detail::d1::adaptive_mode<tbb::detail::d1::auto_partition_type> >::work_balance<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostOuterParFor, int>, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostOuterParFor, int>, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) ()
#6  0x000000000042e814 in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<InnermostArenaBody::InnermostOuterParFor, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) ()
#7  0x0000ffff8b17d260 in tbb::detail::r1::co_local_wait_for_all(unsigned int, unsigned int) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#8  0x0000ffff8ad4d140 in ?? () at ../sysdeps/unix/sysv/linux/aarch64/setcontext.S:123 from /lib64/libc.so.6
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) thread 8
[Switching to thread 8 (Thread 0xffff89267010 (LWP 105484))]
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0xffffef716e04) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88        int err = lll_futex_timed_wait (futex_word, expected, NULL, private);
(gdb) bt
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0xffffef716e04) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0xffffef716da8, cond=0xffffef716dd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0xffffef716dd8, mutex=0xffffef716da8) at pthread_cond_wait.c:655
#3  0x0000ffff8b05d738 in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>)
    at /usr/src/debug/gcc-8.5.0-4.0.1.el8_5.aarch64/obj-aarch64-redhat-linux/aarch64-redhat-linux/libstdc++-v3/include/aarch64-redhat-linux/bits/gthr-default.h:864
#4  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#5  0x00000000004277dc in AsyncActivity::asyncLoop(AsyncActivity*) ()
#6  0x0000ffff8b063cf4 in std::execute_native_thread_routine (__p=0x29af1f90) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#7  0x0000ffff8ae878f8 in start_thread (arg=0xffff8b063cd8 <std::execute_native_thread_routine(void*)>) at pthread_create.c:479
#8  0x0000ffff8addd1fc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) thread 9
[Switching to thread 9 (Thread 0xffff8a269010 (LWP 105486))]
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0xffffef716e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88        int err = lll_futex_timed_wait (futex_word, expected, NULL, private);
(gdb) bt
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0xffffef716e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0xffffef716da8, cond=0xffffef716dd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0xffffef716dd8, mutex=0xffffef716da8) at pthread_cond_wait.c:655
#3  0x0000ffff8b05d738 in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>)
    at /usr/src/debug/gcc-8.5.0-4.0.1.el8_5.aarch64/obj-aarch64-redhat-linux/aarch64-redhat-linux/libstdc++-v3/include/aarch64-redhat-linux/bits/gthr-default.h:864
#4  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#5  0x00000000004277dc in AsyncActivity::asyncLoop(AsyncActivity*) ()
#6  0x0000ffff8b063cf4 in std::execute_native_thread_routine (__p=0x29af22b0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#7  0x0000ffff8ae878f8 in start_thread (arg=0xffff8b063cd8 <std::execute_native_thread_routine(void*)>) at pthread_create.c:479
#8  0x0000ffff8addd1fc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) thread 10
[Switching to thread 10 (Thread 0xffff89a68010 (LWP 105485))]
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0xffffef716e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88        int err = lll_futex_timed_wait (futex_word, expected, NULL, private);
(gdb) bt
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0xffffef716e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0xffffef716da8, cond=0xffffef716dd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0xffffef716dd8, mutex=0xffffef716da8) at pthread_cond_wait.c:655
#3  0x0000ffff8b05d738 in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>)
    at /usr/src/debug/gcc-8.5.0-4.0.1.el8_5.aarch64/obj-aarch64-redhat-linux/aarch64-redhat-linux/libstdc++-v3/include/aarch64-redhat-linux/bits/gthr-default.h:864
#4  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#5  0x00000000004277dc in AsyncActivity::asyncLoop(AsyncActivity*) ()
#6  0x0000ffff8b063cf4 in std::execute_native_thread_routine (__p=0x29af2410) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#7  0x0000ffff8ae878f8 in start_thread (arg=0xffff8b063cd8 <std::execute_native_thread_routine(void*)>) at pthread_create.c:479
#8  0x0000ffff8addd1fc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) thread 11
[Switching to thread 11 (Thread 0xffff88620010 (LWP 105483))]
#0  __ieee754_log10 (x=7000009) at ../sysdeps/ieee754/dbl-64/wordsize-64/e_log10.c:62
62        EXTRACT_WORDS64 (hx, x);
(gdb) bt
#0  __ieee754_log10 (x=7000009) at ../sysdeps/ieee754/dbl-64/wordsize-64/e_log10.c:62
#1  0x0000000000412bd0 in doctest::(anonymous namespace)::ConsoleReporter::test_run_end(doctest::TestRunStats const&) ()
#2  0x0000000000415a58 in doctest::(anonymous namespace)::FatalConditionHandler::handleSignal(int) ()
#3  <signal handler called>
#4  0x000000000042edac in void tbb::detail::d1::dynamic_grainsize_mode<tbb::detail::d1::adaptive_mode<tbb::detail::d1::auto_partition_type> >::work_balance<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<OutermostArenaBody, int>, tbb::detail::d1::auto_partitioner const>, tbb::detail::d1::blocked_range<int> >(tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<OutermostArenaBody, int>, tbb::detail::d1::auto_partitioner const>&, tbb::detail::d1::blocked_range<int>&, tbb::detail::d1::execution_data&) ()
#5  0x000000000042f36c in tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<OutermostArenaBody, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) ()
#6  0x0000ffff8b17ede0 in tbb::detail::r1::market::process(rml::job&) () from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#7  0x0000ffff8b17950c in tbb::detail::r1::rml::private_worker::thread_routine(void*) ()
   from /root/oneTBB-013035b4e9af39f506e87ae6b755c3363e768d4d/build/gcc85-release/gnu_8.5_cxx17_64_release/libtbb.so.12
#8  0x0000ffff8ae878f8 in start_thread (arg=0xffff8b179480 <tbb::detail::r1::rml::private_worker::thread_routine(void*)>) at pthread_create.c:479
#9  0x0000ffff8addd1fc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) thread 12
Unknown thread 12.
(gdb)

RelWithDebInfo build work without error.

phprus avatar Dec 24 '21 18:12 phprus

@ulfworsoe is this issue still relevant?

nofuturre avatar Jul 09 '24 06:07 nofuturre

I don't know if it's relevant with the latest oneapi release. I'll have to check. I'm on vacation at the moment, so if you can leave the issue open i can run the tests in a couple of weeks.

ulfworsoe avatar Jul 09 '24 08:07 ulfworsoe

Sure, thank you for a quick response

nofuturre avatar Jul 09 '24 11:07 nofuturre

I have rerun the tests. I don't have access to a graviton at the moment, so it is run on a machine with similar capabilities.

  • System: Ubuntu 20.04.6 LTS
  • Machine: NVIDIA Orin Jetson
  • Repo Tag: v2021.13.0

There is one test that hangs, but I can't say if it is the same issue:

 86/137 Test  #86: conformance_parallel_for .................Subprocess terminated***Exception: 8035.38 sec

This is where it hangs:

#0  syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1  0x0000ffff9634a9d8 in tbb::detail::r1::futex_wait (comparand=2, futex=0xfffff6a4a870) at /home/ulfw/download/oneTBB/src/tbb/semaphore.h:253
#2  tbb::detail::r1::binary_semaphore::P (this=0xfffff6a4a870) at /home/ulfw/download/oneTBB/src/tbb/semaphore.h:253
#3  tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>::wait (this=0xfffff6a4a840) at /home/ulfw/download/oneTBB/src/tbb/concurrent_monitor.h:170
#4  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (node=..., this=<optimized out>) at /home/ulfw/download/oneTBB/src/tbb/concurrent_monitor.h:232
#5  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::wait<tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&>(tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&, tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>&&) (node=..., pred=<synthetic pointer>..., 
    this=<optimized out>) at /home/ulfw/download/oneTBB/src/tbb/concurrent_monitor.h:262
#6  tbb::detail::r1::sleep_waiter::sleep<tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}>(unsigned long, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}) (wakeup_condition=..., uniq_tag=281474819729704, this=0xfffff6a4a7e8) at /home/ulfw/download/oneTBB/src/tbb/waiters.h:133
#7  tbb::detail::r1::external_waiter::pause (this=0xfffff6a4a7e8) at /home/ulfw/download/oneTBB/src/tbb/waiters.h:160
#8  tbb::detail::r1::external_waiter::pause (this=<optimized out>) at /home/ulfw/download/oneTBB/src/tbb/waiters.h:153
#9  tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::external_waiter> (critical_allowed=true, fifo_allowed=true, isolation=0, waiter=..., ed=..., tls=..., this=<optimized out>) at /home/ulfw/download/oneTBB/src/tbb/task_dispatcher.h:232
#10 tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0xffff95e07480) at /home/ulfw/download/oneTBB/src/tbb/task_dispatcher.h:351
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0xffff95e07480) at /home/ulfw/download/oneTBB/src/tbb/task_dispatcher.h:459
#12 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /home/ulfw/download/oneTBB/src/tbb/task_dispatcher.cpp:168
#13 0x0000aaaad48043f0 in tbb::detail::d1::execute_and_wait (w_ctx=..., wait_ctx=..., t_ctx=..., t=...) at /home/ulfw/download/oneTBB/src/tbb/../../include/oneapi/tbb/detail/_task.h:191
#14 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned short>, tbb::detail::d1::parallel_for_body_wrapper<TestFunctor<unsigned short>, unsigned short>, tbb::detail::d1::simple_partitioner const>::run (partitioner=..., context=..., body=<synthetic pointer>..., range=<synthetic pointer>...) at /home/ulfw/download/oneTBB/src/tbb/../../include/oneapi/tbb/parallel_for.h:112
#15 tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned short>, tbb::detail::d1::parallel_for_body_wrapper<TestFunctor<unsigned short>, unsigned short>, tbb::detail::d1::simple_partitioner const>::run (partitioner=..., body=<synthetic pointer>..., range=<synthetic pointer>...) at /home/ulfw/download/oneTBB/src/tbb/../../include/oneapi/tbb/parallel_for.h:101
#16 tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<unsigned short>, tbb::detail::d1::parallel_for_body_wrapper<TestFunctor<unsigned short>, unsigned short> > (partitioner=..., body=<synthetic pointer>..., range=<synthetic pointer>...) at /home/ulfw/download/oneTBB/src/tbb/../../include/oneapi/tbb/parallel_for.h:237
#17 tbb::detail::d1::parallel_for_impl<unsigned short, TestFunctor<unsigned short>, tbb::detail::d1::simple_partitioner const> (last=1024, partitioner=..., f=..., step=221, first=0) at /home/ulfw/download/oneTBB/src/tbb/../../include/oneapi/tbb/parallel_for.h:314
#18 tbb::detail::d1::parallel_for_impl<unsigned short, TestFunctor<unsigned short>, tbb::detail::d1::simple_partitioner const> (last=1024, partitioner=..., f=..., step=221, first=0) at /home/ulfw/download/oneTBB/src/tbb/../../include/oneapi/tbb/parallel_for.h:306
#19 tbb::detail::d1::parallel_for<unsigned short, TestFunctor<unsigned short> > (partitioner=..., f=..., step=221, last=1024, first=0) at /home/ulfw/download/oneTBB/src/tbb/../../include/oneapi/tbb/parallel_for.h:328
#20 InvokerStep<parallel_tag, tbb::detail::d1::simple_partitioner const, unsigned short, TestFunctor<unsigned short> >::operator() (this=<synthetic pointer>, first=<synthetic pointer>: <optimized out>, last=<synthetic pointer>: 1024, step=<synthetic pointer>: <optimized out>, p=..., f=...) at /home/ulfw/download/oneTBB/test/conformance/conformance_parallel_for.cpp:148
#21 TestParallelForWithStepSupportHelper<parallel_tag, unsigned short, tbb::detail::d1::simple_partitioner const> (p=...) at /home/ulfw/download/oneTBB/test/conformance/conformance_parallel_for.cpp:210
#22 0x0000aaaad48051d0 in TestParallelForWithStepSupport<parallel_tag, unsigned short> () at /home/ulfw/download/oneTBB/src/tbb/../../include/oneapi/tbb/partitioner.h:629
#23 0x0000aaaad47eea4c in doctest::Context::run (this=<optimized out>) at /home/ulfw/download/oneTBB/test/common/doctest.h:7060
#24 0x0000aaaad47d361c in main (argc=<optimized out>, argv=<optimized out>) at /home/ulfw/download/oneTBB/test/common/doctest.h:7138

ulfworsoe avatar Jul 30 '24 10:07 ulfworsoe