itwinjs-core icon indicating copy to clipboard operation
itwinjs-core copied to clipboard

intermittent hanging in rush cover (on linux)

Open MichaelBelousov opened this issue 2 years ago • 8 comments

Describe the bug rush cover hangs intermittently, confirmed on linux, don't know if it's been seen anywhere else

To Reproduce

git clone https://github.com/itwin/itwinjs-core.git
cd itwinjs-core
rush install
rush build
rush webpack:test
# run enough times to catch it
for i in {1..200}; do rush cover; done

Expected behavior rush cover ends in a reasonable amount of time, e.g. ever

Desktop:

  • OS: ubuntu linux
  • Version 21.04
  • iTwin.js Version: 3.4.0-dev.3 (3f85b180d67e3183ee9f0194d6dd8c49ed319970)

Additional context

Main thread:

(gdb) bt
#0  lll_mutex_unlock_optimized (mutex=0x4d7b4c0) at pthread_mutex_unlock.c:44
#1  __GI___pthread_mutex_unlock_usercnt (decr=1, mutex=0x4d7b4c0) at pthread_mutex_unlock.c:69
#2  ___pthread_mutex_unlock (mutex=0x4d7b4c0) at pthread_mutex_unlock.c:369
#3  0x0000000001571bf9 in uv_mutex_unlock (mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:351
#4  0x0000000000b7970d in node::NodePlatform::DrainTasks(v8::Isolate*) ()
#5  0x0000000000a4656a in node::FreeEnvironment(node::Environment*) ()
#6  0x0000000000b4b93f in node::NodeMainInstance::Run(node::EnvSerializeInfo const*) ()
#7  0x0000000000acd3f2 in node::Start(int, char**) ()
#8  0x00007f5a6dedffd0 in __libc_start_call_main (main=main@entry=0xa3d930 <main>, argc=argc@entry=7, argv=argv@entry=0x7ffdab890888)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#9  0x00007f5a6dee007d in __libc_start_main_impl (main=0xa3d930 <main>, argc=7, argv=0x7ffdab890888, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=0x7ffdab890878) at ../csu/libc-start.c:409
#10 0x0000000000a400ec in _start ()

all threads:


Thread 11 (Thread 0x7f5a5d7fa640 (LWP 1309712) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4972c28 <cond+40>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x4972c28 <cond+40>) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4972c28 <cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f5a6df49940 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x4972bc0 <mutex>, cond=0x4972c00 <cond>) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x4972c00 <cond>, mutex=mutex@entry=0x4972bc0 <mutex>) at pthread_cond_wait.c:628
#5  0x00000000015721d9 in uv_cond_wait (cond=cond@entry=0x4972c00 <cond>, mutex=mutex@entry=0x4972bc0 <mutex>) at ../deps/uv/src/unix/thread.c:780
#6  0x000000000155e4f4 in worker (arg=0x0) at ../deps/uv/src/threadpool.c:76
#7  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#8  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 10 (Thread 0x7f5a5dffb640 (LWP 1309711) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4972c28 <cond+40>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x4972c28 <cond+40>) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4972c28 <cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f5a6df49940 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x4972bc0 <mutex>, cond=0x4972c00 <cond>) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x4972c00 <cond>, mutex=mutex@entry=0x4972bc0 <mutex>) at pthread_cond_wait.c:628
#5  0x00000000015721d9 in uv_cond_wait (cond=cond@entry=0x4972c00 <cond>, mutex=mutex@entry=0x4972bc0 <mutex>) at ../deps/uv/src/unix/thread.c:780
#6  0x000000000155e4f4 in worker (arg=0x0) at ../deps/uv/src/threadpool.c:76
#7  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#8  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 9 (Thread 0x7f5a5e7fc640 (LWP 1309710) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4972c28 <cond+40>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x4972c28 <cond+40>) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4972c28 <cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f5a6df49940 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x4972bc0 <mutex>, cond=0x4972c00 <cond>) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x4972c00 <cond>, mutex=mutex@entry=0x4972bc0 <mutex>) at pthread_cond_wait.c:628
#5  0x00000000015721d9 in uv_cond_wait (cond=cond@entry=0x4972c00 <cond>, mutex=mutex@entry=0x4972bc0 <mutex>) at ../deps/uv/src/unix/thread.c:780
#6  0x000000000155e4f4 in worker (arg=0x0) at ../deps/uv/src/threadpool.c:76
#7  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#8  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 8 (Thread 0x7f5a5effd640 (LWP 1309709) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4972c28 <cond+40>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x4972c28 <cond+40>) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4972c28 <cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f5a6df49940 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x4972bc0 <mutex>, cond=0x4972c00 <cond>) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x4972c00 <cond>, mutex=mutex@entry=0x4972bc0 <mutex>) at pthread_cond_wait.c:628
#5  0x00000000015721d9 in uv_cond_wait (cond=cond@entry=0x4972c00 <cond>, mutex=mutex@entry=0x4972bc0 <mutex>) at ../deps/uv/src/unix/thread.c:780
#6  0x000000000155e4f4 in worker (arg=0x0) at ../deps/uv/src/threadpool.c:76
#7  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#8  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 7 (Thread 0x7f5a6e423640 (LWP 1308522) "node"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4967180 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x4967180 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4967180 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at futex-internal.c:139
#3  0x00007f5a6df52a3f in do_futex_wait (sem=sem@entry=0x4967180 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>, abstime=0x0, clockid=0) at sem_waitcommon.c:112
#4  0x00007f5a6df52ad8 in __new_sem_wait_slow64 (sem=sem@entry=0x4967180 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>, abstime=0x0, clockid=0) at sem_waitcommon.c:184
#5  0x00007f5a6df52b51 in __new_sem_wait (sem=sem@entry=0x4967180 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at sem_wait.c:43
#6  0x0000000001572012 in uv__sem_wait (sem=<optimized out>) at ../deps/uv/src/unix/thread.c:626
#7  uv_sem_wait (sem=0x4967180 <node::inspector::(anonymous namespace)::start_io_thread_semaphore>) at ../deps/uv/src/unix/thread.c:682
#8  0x0000000000c13bd5 in node::inspector::(anonymous namespace)::StartIoThreadMain(void*) ()
#9  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#10 0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 6 (Thread 0x7f5a677fe640 (LWP 1308516) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4d7b514) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x4d7b514) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4d7b514, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f5a6df49940 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x4d7b4c0, cond=0x4d7b4e8) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=0x4d7b4e8, mutex=0x4d7b4c0) at pthread_cond_wait.c:628
#5  0x00000000015721d9 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:780
#6  0x0000000000b75eeb in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
#7  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#8  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 5 (Thread 0x7f5a67fff640 (LWP 1308515) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4d7b514) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x4d7b514) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4d7b514, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f5a6df49940 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x4d7b4c0, cond=0x4d7b4e8) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=0x4d7b4e8, mutex=0x4d7b4c0) at pthread_cond_wait.c:628
#5  0x00000000015721d9 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:780
#6  0x0000000000b75eeb in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
#7  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#8  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 4 (Thread 0x7f5a6ceae640 (LWP 1308514) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4d7b514) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x4d7b514) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4d7b514, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f5a6df49940 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x4d7b4c0, cond=0x4d7b4e8) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=0x4d7b4e8, mutex=0x4d7b4c0) at pthread_cond_wait.c:628
#5  0x00000000015721d9 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:780
#6  0x0000000000b75eeb in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
#7  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#8  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 3 (Thread 0x7f5a6d6af640 (LWP 1308513) "node"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x4d7b514) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x4d7b514) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x4d7b514, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x00007f5a6df49940 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x4d7b4c0, cond=0x4d7b4e8) at pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=0x4d7b4e8, mutex=0x4d7b4c0) at pthread_cond_wait.c:628
#5  0x00000000015721d9 in uv_cond_wait (cond=<optimized out>, mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:780
#6  0x0000000000b75eeb in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
#7  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#8  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 2 (Thread 0x7f5a6deb0640 (LWP 1308512) "node"):
#0  0x00007f5a6dfdad8e in epoll_wait (epfd=9, events=events@entry=0x7f5a6deacce0, maxevents=maxevents@entry=1024, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x0000000001575c07 in uv__io_poll (loop=loop@entry=0x4d7b938, timeout=<optimized out>) at ../deps/uv/src/unix/epoll.c:236
#2  0x0000000001563c88 in uv_run (loop=0x4d7b938, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:389
#3  0x0000000000b7b02b in node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::_FUN(void*) ()
#4  0x00007f5a6df4a947 in start_thread (arg=<optimized out>) at pthread_create.c:435
#5  0x00007f5a6dfdaa44 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Thread 1 (Thread 0x7f5a6deb17c0 (LWP 1308473) "node"):
#0  0x0000000000b79716 in node::NodePlatform::DrainTasks(v8::Isolate*) ()
#1  0x0000000000a4656a in node::FreeEnvironment(node::Environment*) ()
#2  0x0000000000b4b93f in node::NodeMainInstance::Run(node::EnvSerializeInfo const*) ()
#3  0x0000000000acd3f2 in node::Start(int, char**) ()
#4  0x00007f5a6dedffd0 in __libc_start_call_main (main=main@entry=0xa3d930 <main>, argc=argc@entry=7, argv=argv@entry=0x7ffdab890888) at ../sysdeps/nptl/libc_start_call_main.h:58
#5  0x00007f5a6dee007d in __libc_start_main_impl (main=0xa3d930 <main>, argc=7, argv=0x7ffdab890888, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdab890878) at ../csu/libc-start.c:409
#6  0x0000000000a400ec in _start ()

MichaelBelousov avatar Aug 10 '22 14:08 MichaelBelousov

It hangs while running the @itwin/appui-react tests.

MichaelBelousov avatar Aug 10 '22 14:08 MichaelBelousov

I am going to try rerunning with a debug build of node overnight tonight to get a more detailed callstack on how node is using libuv

MichaelBelousov avatar Aug 10 '22 14:08 MichaelBelousov

Are you able to provide the test output from @itwin/appui-react to see what specific test it was running?

Looks like possibly same hang as occurred on Windows build agent yesterday. Ran for 1 hour 47 minutes before timing out.

  UiComponents
    √ i18n should throw Error without initialize
    √ i18nNamespace should return UiComponents
    √ packageName should return components-react
    √ translate should return the key (in test environment)
    √ translate should return blank and log error if UiComponents not initialized
    √ calling initialize twice should log
    √ calling loggerCategory without an obj should return packageName


  1975 passing (27s)
  4 pending

Terminate batch job (Y/N)? 
^C
##[error]The operation was canceled.
Finishing: rush cover

pmconne avatar Aug 10 '22 14:08 pmconne

I don't seem to get test output on rush cover, not sure what I have misconfigured. I just have:

Rush Multi-Project Build Tool 5.58.0 - https://rushjs.io
Node.js version is 16.14.0 (LTS)


Starting "rush cover"

Executing a maximum of 12 simultaneous processes...

==[ @itwin/appui-react ]==========================================[ 1 of 28 ]==
WARNING: 4 tests skipped in @itwin/appui-react

Actually, I'm doing rush cover -t @itwin/appui-react in the process that is hanging, but rush cover doesn't get the test output for me either.

MichaelBelousov avatar Aug 10 '22 15:08 MichaelBelousov

Add -v for "verbose".

pmconne avatar Aug 10 '22 15:08 pmconne

I'll restart the loop until I get it then

MichaelBelousov avatar Aug 10 '22 15:08 MichaelBelousov

I have not been able to reproduce the hang after 200 or so runs, about half of rush cover and half of just core-backend tests. I have only reproduced:

  1) ECSqlStatement
       should restart query:

      AssertionError: expected +0 to be at least 1
      + expected - actual

      -0
      +1
      
      at /home/mike/work/itwinjs2/core/backend/src/test/ecdb/ECSqlStatement.test.ts:238:14

Which I will look for/make a separate issue for

MichaelBelousov avatar Aug 12 '22 23:08 MichaelBelousov

In reality I have a core dump from when I first saw the hang running rush cover where I got the posted stacks, so I'll look at that again but I wasn't able to see anything in it the first time I looked, I don't know enough about the functions I see there or libuv

MichaelBelousov avatar Aug 12 '22 23:08 MichaelBelousov

I can no longer reproduce this on master. Need to check if people are still seeing this in rush cover then close.

MichaelBelousov avatar Aug 18 '22 01:08 MichaelBelousov

I can tell you that previously this was affecting probably one of every two PRs and now appears to have ceased completely.

pmconne avatar Aug 18 '22 09:08 pmconne

closing unless there's evidence it's still around

maybe the presentation crash fix fixed this too

MichaelBelousov avatar Aug 18 '22 11:08 MichaelBelousov