corert icon indicating copy to clipboard operation
corert copied to clipboard

sched_yield infinite loop

Open yunekit opened this issue 6 years ago • 11 comments
trafficstars

The sched_yield()==0 infinite loop occurred during the running of the application, causing the process to block and fail to work.

yunekit avatar Oct 07 '19 08:10 yunekit

The application lost its response, and it was observed with strace that an infinite loop occurred during the call to sched_yield, suspected to occur during the GC.

yunekit avatar Oct 07 '19 08:10 yunekit

Might this be due to https://github.com/dotnet/corert/issues/7807#issuecomment-537646340 ?

Suchiman avatar Oct 07 '19 09:10 Suchiman

Thank you. The specific cause of this problem has not yet been found. In my source code, all loops have other calls, but the problem thread that has an infinite loop has only sched_yield calls.

yunekit avatar Oct 07 '19 12:10 yunekit

Could you please share the stacktrace that leads to the sched_yield()==0 infinite loop that you are seeing?

jkotas avatar Oct 07 '19 14:10 jkotas

The stack information obtained using GDB is as follows:

Thread 7 (Thread 0x7fd4cafd0700 (LWP 4739)):
#0  0x00007fd4f53aa827 in sched_yield () at ../sysdeps/unix/syscall-template.S:84
#1  0x00000000004611e6 in PalSwitchToThread () at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:571
#2  0x0000000000421455 in ThreadStore::SuspendAllThreads (this=<optimized out>, waitForGCEvent=<optimized out>, fireDebugEvent=<optimized out>) at /root/corert_3114860/src/Native/Runtime/threadstore.cpp:279
#3  0x000000000042fed5 in WKS::GCHeap::GarbageCollectGeneration (this=<optimized out>, gen=<optimized out>, reason=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:36525
#4  0x0000000000431e62 in WKS::gc_heap::try_allocate_more_space (acontext=<optimized out>, size=<optimized out>, flags=<optimized out>, gen_number=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:13915
#5  0x0000000000455db0 in WKS::gc_heap::allocate_more_space (size=<optimized out>, flags=0, alloc_generation_number=0, acontext=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:14350
#6  WKS::gc_heap::allocate (jsize=<optimized out>, acontext=<optimized out>, flags=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:14381
#7  WKS::GCHeap::Alloc (this=<optimized out>, context=<optimized out>, size=<optimized out>, flags=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:35807
#8  0x0000000000417dab in RhpGcAlloc (pEEType=<optimized out>, uFlags=0, cbSize=24, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/gcrhenv.cpp:288
#9  0x000000000046de9c in RhpNewObject () at /root/corert_3114860/src/Native/Runtime/unix/unixasmmacrosamd64.inc:429
#10 0x0000000000624f59 in jwsd_jwsHttpd_RequestDataPreprocessor__RecvComplete (sck=..., size=81, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:154
#11 0x00000000005f88f8 in jwsd_JNet_Sockets_JwsSocketReader__SocketReaderNetEvent (sck=..., eve=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/JwsSocketReader.cs:228
#12 0x00000000005cf337 in jwsd_JNet_EpollNetDataMonitor__SocketWorkThreadFunc (this=..., o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:545
#13 0x00000000007afdae in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\FacetChecker.cs:1167
#14 0x0000000000858d4c in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#15 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#16 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#17 0x00007fd4f56976ba in start_thread (arg=0x7fd4cafd0700) at pthread_create.c:333
#18 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

After the thread 7 completes the program transaction, it triggers the GC action. After suspending all the threads, PalSwitchToThread() --> sched_yield () enters the infinite loop.

yunekit avatar Oct 08 '19 12:10 yunekit

This thread is waiting for other threads to reach GC safe point. Could you please check the stacktraces of the the other threads? You should fine a thread that is running inside a managed code and that this thread is waiting on. That other thread running inside managed code is the problem.

jkotas avatar Oct 08 '19 15:10 jkotas

This app is a web service program. It runs very stable on the mono and .net core platforms.

Since the GC keeps all threads in a suspended state, other threads cannot continue to work. All threads are as follows:

Thread 14 (Thread 0x7fd4aafd0700 (LWP 4753)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000799646 in S_P_CoreLib_System_Threading_LowLevelMonitor__Acquire ()
#5  0x0000000000859ab8 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait ()
#6  0x00000000007991f4 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__WaitForSignal ()
#7  0x0000000000799033 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__Wait ()
#8  0x0000000000858f4f in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WaitForRequest ()
#9  0x0000000000858d61 in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart ()
#10 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread ()
#11 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint ()
#12 0x00007fd4f56976ba in start_thread (arg=0x7fd4aafd0700) at pthread_create.c:333
#13 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 13 (Thread 0x7fd4c97a0700 (LWP 4752)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait ()
#5  0x00000000007991f4 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__WaitForSignal ()
#6  0x0000000000799033 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__Wait ()
#7  0x0000000000858f4f in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WaitForRequest ()
#8  0x0000000000858d61 in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart ()
#9  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread ()
#10 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint ()
#11 0x00007fd4f56976ba in start_thread (arg=0x7fd4c97a0700) at pthread_create.c:333
#12 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 12 (Thread 0x7fd4ab7e0700 (LWP 4751)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait ()
#5  0x00000000007991f4 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__WaitForSignal ()
#6  0x0000000000799033 in S_P_CoreLib_System_Threading_LowLevelLifoSemaphore__Wait ()
#7  0x0000000000858f4f in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WaitForRequest ()
#8  0x0000000000858d61 in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart ()
#9  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread ()
#10 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint ()
#11 0x00007fd4f56976ba in start_thread (arg=0x7fd4ab7e0700) at pthread_create.c:333
#12 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7fd4abff0700 (LWP 4744)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000461f01 in UnixEvent::Wait (this=<optimized out>, milliseconds=<optimized out>) at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:297
#2  0x000000000046fd8b in CLREventStatic::Wait (this=0x33adbc0, dwMilliseconds=<optimized out>, bAlertable=false, bAllowReentrantWait=false) at /root/corert_3114860/src/Native/Runtime/event.cpp:105
#3  0x0000000000421673 in ThreadStore::WaitForSuspendComplete (this=<optimized out>) at /root/corert_3114860/src/Native/Runtime/threadstore.cpp:322
#4  0x00000000004208d3 in Thread::WaitForSuspend (this=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:77
#5  RhpWaitForSuspend2 () at /root/corert_3114860/src/Native/Runtime/thread.cpp:946
#6  0x000000000079ad1c in S_P_CoreLib_System_Threading_Thread__Yield ()
#7  0x0000000000859d91 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__UninterruptibleSleep0 ()
#8  0x00000000007ad276 in S_P_CoreLib_System_Threading_SpinWait__SpinOnceCore ()
#9  0x00000000007ab43f in S_P_CoreLib_System_Threading_SemaphoreSlim__Wait_4 ()
#10 0x00000000005fe173 in jwsd_JNet_Sockets_SocketCloseHelper__DoClose () at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/SocketCloseHelper.cs:204
#11 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#12 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#13 0x00007fd4f56976ba in start_thread (arg=0x7fd4abff0700) at pthread_create.c:333
#14 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 10 (Thread 0x7fd4c8f90700 (LWP 4743)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x00000000005010b1 in System_IO_FileSystem_Interop_Sys__LStat_0 () at /_/src/Common/src/CoreLib/System/Runtime/Versioning/NonVersionableAttribute.cs:29
#5  0x0000000000501c27 in System_IO_FileSystem_Interop_Sys__LStat_1 (path=..., output=...) at /_/src/Common/src/Interop/Unix/System.Native/Interop.Stat.Span.cs:36
#6  0x00000000005b5943 in System_IO_FileSystem_System_IO_FileSystem__FileExists_0 (fullPath=..., fileType=32768, errorInfo=...) at /_/src/System.IO.FileSystem/src/System/IO/FileSystem.Exists.Unix.cs:41
#7  0x0000000000558acd in System_IO_FileSystem_System_IO_File__Exists (path=...) at /_/src/System.IO.FileSystem/src/System/IO/File.cs:136
#8  0x0000000000639e88 in jwsd_jwsHttpd_SiteHostManager__TargetFilePathAndName (this=..., urlPath=..., pPathAndName=..., newUrlPath=..., toDivFix=@0x7fd4c8f8f928: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:791
#9  0x0000000000639632 in jwsd_jwsHttpd_SiteHostManager__TryGetTargetFilePathAndName (this=..., rdt=..., toDirFix=@0x7fd4c8f8fa18: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:661
#10 0x000000000062a865 in jwsd_jwsHttpd_RequestWorker__RealRun (reqss=..., rdt=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestWorker.cs:357
#11 0x0000000000629ae4 in jwsd_jwsHttpd_RequestWorker__Run (reqs=..., rdate=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestWorker.cs:28
#12 0x0000000000624c2d in jwsd_jwsHttpd_RequestDataPreprocessor__ParseData_0 (reqs=..., handlerSize=<optimized out>) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:110
#13 0x00000000006251a9 in jwsd_jwsHttpd_RequestDataPreprocessor__RecvComplete (sck=..., size=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:253
#14 0x00000000005f88f8 in jwsd_JNet_Sockets_JwsSocketReader__SocketReaderNetEvent (sck=..., eve=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/JwsSocketReader.cs:228
#15 0x00000000005cf337 in jwsd_JNet_EpollNetDataMonitor__SocketWorkThreadFunc (this=..., o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:545
#16 0x00000000007afdae in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\FacetChecker.cs:1167
#17 0x0000000000858d4c in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#18 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#19 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#20 0x00007fd4f56976ba in start_thread (arg=0x7fd4c8f90700) at pthread_create.c:333
#21 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 9 (Thread 0x7fd4c9fb0700 (LWP 4741)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x00000000004ff941 in System_IO_FileSystem_Interop_Sys__Stat_0 () at /_/artifacts/obj/System.IO.Compression.Brotli/netcoreapp-Unix-Release/System.SR.cs:20
#5  0x0000000000500337 in System_IO_FileSystem_Interop_Sys__Stat_1 (path=..., output=...) at /_/src/Common/src/Interop/Unix/System.Native/Interop.Stat.Span.cs:24
#6  0x00000000005b5927 in System_IO_FileSystem_System_IO_FileSystem__FileExists_0 (fullPath=..., fileType=16384, errorInfo=...) at /_/src/System.IO.FileSystem/src/System/IO/FileSystem.Exists.Unix.cs:41
#7  0x00000000005280ba in System_IO_FileSystem_System_IO_Directory__Exists (path=...) at /_/src/System.IO.FileSystem/src/System/IO/Directory.cs:64
#8  0x000000000063a38d in jwsd_jwsHttpd_SiteHostManager__UrlToPhysicsPath (this=..., urlPath=..., isDir=@0x7fd4c9faf870: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:908
#9  0x0000000000639d86 in jwsd_jwsHttpd_SiteHostManager__TargetFilePathAndName (this=..., urlPath=..., pPathAndName=..., newUrlPath=..., toDivFix=@0x7fd4c9faf928: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:760
#10 0x0000000000639632 in jwsd_jwsHttpd_SiteHostManager__TryGetTargetFilePathAndName (this=..., rdt=..., toDirFix=@0x7fd4c9fafa18: false) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/SiteHostManager.cs:661
#11 0x000000000062a865 in jwsd_jwsHttpd_RequestWorker__RealRun (reqss=..., rdt=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestWorker.cs:357
#12 0x0000000000629ae4 in jwsd_jwsHttpd_RequestWorker__Run (reqs=..., rdate=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestWorker.cs:28
#13 0x0000000000624c2d in jwsd_jwsHttpd_RequestDataPreprocessor__ParseData_0 (reqs=..., handlerSize=<optimized out>) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:110
#14 0x00000000006251a9 in jwsd_jwsHttpd_RequestDataPreprocessor__RecvComplete (sck=..., size=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:253
#15 0x00000000005f88f8 in jwsd_JNet_Sockets_JwsSocketReader__SocketReaderNetEvent (sck=..., eve=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/JwsSocketReader.cs:228
#16 0x00000000005cf337 in jwsd_JNet_EpollNetDataMonitor__SocketWorkThreadFunc (this=..., o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:545
#17 0x00000000007afdae in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\FacetChecker.cs:1167
#18 0x0000000000858d4c in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#19 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#20 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#21 0x00007fd4f56976ba in start_thread (arg=0x7fd4c9fb0700) at pthread_create.c:333
#22 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 8 (Thread 0x7fd4ca7c0700 (LWP 4740)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#5  0x0000000000859e16 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo(long,...)(...)( *) static () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#6  0x0000000000857572 in S_P_CoreLib_System_Threading_ClrThreadPool_GateThread__GateThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#7  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#8  0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#9  0x00007fd4f56976ba in start_thread (arg=0x7fd4ca7c0700) at pthread_create.c:333
#10 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7fd4cafd0700 (LWP 4739)):
#0  0x00007fd4f53aa827 in sched_yield () at ../sysdeps/unix/syscall-template.S:84
#1  0x00000000004611e6 in PalSwitchToThread () at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:571
#2  0x0000000000421455 in ThreadStore::SuspendAllThreads (this=<optimized out>, waitForGCEvent=<optimized out>, fireDebugEvent=<optimized out>) at /root/corert_3114860/src/Native/Runtime/threadstore.cpp:279
#3  0x000000000042fed5 in WKS::GCHeap::GarbageCollectGeneration (this=<optimized out>, gen=<optimized out>, reason=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:36525
#4  0x0000000000431e62 in WKS::gc_heap::try_allocate_more_space (acontext=<optimized out>, size=<optimized out>, flags=<optimized out>, gen_number=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:13915
#5  0x0000000000455db0 in WKS::gc_heap::allocate_more_space (size=<optimized out>, flags=0, alloc_generation_number=0, acontext=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:14350
#6  WKS::gc_heap::allocate (jsize=<optimized out>, acontext=<optimized out>, flags=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:14381
#7  WKS::GCHeap::Alloc (this=<optimized out>, context=<optimized out>, size=<optimized out>, flags=<optimized out>) at /root/corert_3114860/src/Native/Runtime/Full/../../gc/gc.cpp:35807
#8  0x0000000000417dab in RhpGcAlloc (pEEType=<optimized out>, uFlags=0, cbSize=24, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/gcrhenv.cpp:288
#9  0x000000000046de9c in RhpNewObject () at /root/corert_3114860/src/Native/Runtime/unix/unixasmmacrosamd64.inc:429
#10 0x0000000000624f59 in jwsd_jwsHttpd_RequestDataPreprocessor__RecvComplete (sck=..., size=81, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/HttpdServer/RequestDataPreprocessor.cs:154
#11 0x00000000005f88f8 in jwsd_JNet_Sockets_JwsSocketReader__SocketReaderNetEvent (sck=..., eve=<optimized out>, o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/Sockets/JwsSocketReader.cs:228
#12 0x00000000005cf337 in jwsd_JNet_EpollNetDataMonitor__SocketWorkThreadFunc (this=..., o=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:545
#13 0x00000000007afdae in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\FacetChecker.cs:1167
#14 0x0000000000858d4c in S_P_CoreLib_System_Threading_ClrThreadPool_WorkerThread__WorkerThreadStart () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#15 0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#16 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#17 0x00007fd4f56976ba in start_thread (arg=0x7fd4cafd0700) at pthread_create.c:333
#18 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7fd4cb7e0700 (LWP 4738)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x000000000094e1eb in <PInvokeDelegateWrapper>jwsd_JNet_TcpAccept_delegateAccept__<ForwardNativeFunctionWrapper>jwsd_JNet_TcpAccept_delegateAccept () at /_/src/System.Net.Http/src/System/Net/Http/Headers/ObjectCollection.cs:15
#5  0x00000000005ea8fd in jwsd_JNet_TcpAccept__AcceptLoop (this=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/TcpAccept.cs:168
#6  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#7  0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#8  0x00007fd4f56976ba in start_thread (arg=0x7fd4cb7e0700) at pthread_create.c:333
#9  0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7fd4cbff0700 (LWP 4737)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#5  0x000000000079d0f9 in S_P_CoreLib_System_Threading_WaitHandle__WaitOneNoCheck () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\ContentValidator.cs:573
#6  0x00000000007b2d65 in S_P_CoreLib_System_Threading_TimerQueue__TimerThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\NamespaceList.cs:159
#7  0x000000000079b2cd in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#8  0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#9  0x00007fd4f56976ba in start_thread (arg=0x7fd4cbff0700) at pthread_create.c:333
#10 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7fd4d8b90700 (LWP 4736)):
#0  0x00007fd4f53c7a13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x000000000094e160 in <PInvokeDelegateWrapper>jwsd_JNet_EpollNetDataMonitor_DelegateEpollWait__<ForwardNativeFunctionWrapper>jwsd_JNet_EpollNetDataMonitor_DelegateEpollWait () at /_/src/System.Net.Http/src/System/Net/Http/Headers/ObjectCollection.cs:15
#2  0x00000000005cb061 in jwsd_JNet_EpollNetDataMonitor__ThreadWaitEpollEvent (this=..., idObj=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:179
#3  0x000000000079b28d in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#4  0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#5  0x00007fd4f56976ba in start_thread (arg=0x7fd4d8b90700) at pthread_create.c:333
#6  0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7fd4d93a0700 (LWP 4735)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000461f01 in UnixEvent::Wait (this=<optimized out>, milliseconds=<optimized out>) at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:297
#2  0x000000000046fd8b in CLREventStatic::Wait (this=0x33adbc0, dwMilliseconds=<optimized out>, bAlertable=false, bAllowReentrantWait=false) at /root/corert_3114860/src/Native/Runtime/event.cpp:105
#3  0x0000000000421673 in ThreadStore::WaitForSuspendComplete (this=<optimized out>) at /root/corert_3114860/src/Native/Runtime/threadstore.cpp:322
#4  0x00000000004208d3 in Thread::WaitForSuspend (this=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:77
#5  RhpWaitForSuspend2 () at /root/corert_3114860/src/Native/Runtime/thread.cpp:946
#6  0x0000000000507392 in System_Console_Interop_Sys__Write () at /mnt/e/Jexus-Core-V1.0/JwsShared/JexusShreadFunctions.cs:55
#7  0x0000000000575bc4 in System_Console_System_ConsolePal__Write_0 (fd=..., bufPtr=0x7fd4dbec6448 "Epoll出现可读写事件并纳入工作线程,socket:8, 状态:CanRead\nRead\n", count=75, mayChangeCursorPosition=true) at /_/src/System.Console/src/System/ConsolePal.Unix.cs:1246
#8  0x000000000057566c in System_Console_System_ConsolePal__Write (fd=..., buffer=..., offset=<optimized out>, count=<optimized out>, mayChangeCursorPosition=<optimized out>) at /_/src/System.Console/src/System/ConsolePal.Unix.cs:1238
#9  0x0000000000809791 in S_P_CoreLib_System_IO_StreamWriter__Flush_0 () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\XsdDuration.cs:658
#10 0x000000000080a54d in S_P_CoreLib_System_IO_StreamWriter__WriteFormatHelper () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\XsdDuration.cs:658
#11 0x000000000080a6a9 in S_P_CoreLib_System_IO_StreamWriter__WriteLine_2 () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\XsdDuration.cs:658
#12 0x000000000086740e in S_P_CoreLib_System_IO_TextWriter_SyncTextWriter__WriteLine_16 () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#13 0x00000000005cec09 in jwsd_JNet_EpollNetDataMonitor__ToWorkCallback (this=..., info_id=<optimized out>, value=<optimized out>, fromRead=<optimized out>) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:514
#14 0x00000000005cb1a5 in jwsd_JNet_EpollNetDataMonitor__ThreadWaitEpollEvent (this=..., idObj=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JNet/EpollNetDataMonitor.cs:239
#15 0x000000000079b28d in S_P_CoreLib_System_Threading_Thread__StartThread () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:183
#16 0x000000000079b779 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Schema\BitSet.cs:249
#17 0x00007fd4f56976ba in start_thread (arg=0x7fd4d93a0700) at pthread_create.c:333
#18 0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7fd4dbe80700 (LWP 4734)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000461f01 in UnixEvent::Wait (this=<optimized out>, milliseconds=<optimized out>) at /root/corert_3114860/src/Native/Runtime/unix/PalRedhawkUnix.cpp:297
#2  0x00000000004166c5 in PalWaitForSingleObjectEx (arg1=0x33b34c0, arg2=4294967295, arg3=0) at /root/corert_3114860/src/Native/Runtime/Full/../PalRedhawkFunctions.h:172
#3  FinalizerStart (pContext=0x33b34c0) at /root/corert_3114860/src/Native/Runtime/FinalizerHelpers.cpp:59
#4  0x00007fd4f56976ba in start_thread (arg=0x7fd4dbe80700) at pthread_create.c:333
#5  0x00007fd4f53c741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fd4f6ca0880 (LWP 4733)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000462e11 in GCEvent::Impl::Wait (this=<optimized out>, milliseconds=<optimized out>, alertable=<optimized out>) at /root/corert_3114860/src/Native/gc/unix/events.cpp:159
#2  0x0000000000420930 in Thread::WaitForGC (this=<optimized out>, pTransitionFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:89
#3  RhpWaitForGC2 (pFrame=<optimized out>) at /root/corert_3114860/src/Native/Runtime/thread.cpp:965
#4  0x0000000000859bf9 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#5  0x0000000000859e16 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo(long,...)(...)( *) static () at F:\workspace\_work\1\s\src\System.Private.Xml\src\System\Xml\Dom\XmlNamedNodeMap.SmallXmlNodeList.cs:178
#6  0x00000000006460df in jwsd_jwsHttpd_SvrMain__RealMain (args=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/SvrMain.cs:116
#7  0x000000000063f53c in jwsd_jwsHttpd_JwsWorkerServer__HttpdMain (args=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsHttpd/JwsWorkerServer.cs:103
#8  0x0000000000689c0e in jwsd_jws_JwsMain__Main1 (args=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsMain.cs:209
#9  0x0000000000689959 in jwsd_jws_JwsMain__Main (args=...) at /mnt/e/Jexus-Core-V1.0/jwsd/JwsMain.cs:177
#10 0x0000000000952abd in jwsd__Module___StartupCodeMain () at <stdin>:15732480
#11 0x0000000000415be5 in main (argc=1, argv=0x7fffd24ac6e8) at /root/corert_3114860/src/Native/Bootstrap/main.cpp:395

yunekit avatar Oct 08 '19 16:10 yunekit

Hmm, the stack traces do not ring any bells. It looks all normal.

Could you please step through the loop in ThreadStore::SuspendAllThreads and find the Thread* that causes keepWaiting to be set to true? Then look at the m_uPalThreadIdForLogging field of this Thread* to identify the threads that causing the problem via Thread ID.

jkotas avatar Oct 08 '19 16:10 jkotas

thank you for your help. This program uses function calls such as epoll_wait, and I suspect that thread hangs are related to this function.

yunekit avatar Oct 08 '19 16:10 yunekit

Do you see that the loop inside ThreadStore::SuspendAllThreads is waiting for the thread that is calling epoll_wait ?

Would you mind sharing the bit of code in EpollNetDataMonitor.cs that calls the epoll_wait? One possibility is that the hang is caused by a bad PInvoke definition for epoll_wait.

jkotas avatar Oct 08 '19 17:10 jkotas

This problem was temporarily resolved by modifying the epoll_wait call parameter. During the stress test, the program was not found to be stuck.

Old code:

var nfds = epoll_wait(epfd,  events,  max, -1);

New code:

var nfds = epoll_wait(epfd,  events,  max, 100);
if (nfds == 0){
    Thread.Sleep(1);
    continue;  //goto epoll_wait...
}

Whether this improvement can completely solve the problem, still need to continue to observe.

yunekit avatar Oct 08 '19 17:10 yunekit