runtime icon indicating copy to clipboard operation
runtime copied to clipboard

GC/API/GC/GetGCMemoryInfo/GetGCMemoryInfo.sh test failing intermittently on CoreCLR Linux ARM32

Open runfoapp[bot] opened this issue 2 years ago • 14 comments

Failures - 4/5-8/5 (incl. PRs) via Runfo - last 120 days

  • two failures 7/21, however these occured on arm64 and x64 (first was PR #72229) and nothing further until 7/28
  • Failures on arm start on 7/28 first one
  • Starting on ~8/4 the frequency escalates from ~1/day -> ~40/day

Potentially related: Failures for https://github.com/dotnet/runtime/issues/73433 also start on arm on 7/28, ~5 hours before this test first fails.

Runfo Tracking Issue: GC/API/GC/GetGCMemoryInfo/GetGCMemoryInfo.sh test failing intermittently on CoreCLR Linux ARM32

Build Definition Kind Run Name Console Core Dump Test Results Run Client
1926124 runtime PR 73434 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1926111 runtime PR 73471 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1926071 runtime PR 73365 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925997 runtime PR 72934 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925987 runtime PR 73476 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925982 runtime PR 73475 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925978 runtime PR 72450 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925959 runtime PR 73216 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925920 runtime PR 73186 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925877 runtime PR 73466 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925814 runtime PR 73310 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925793 runtime PR 73424 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925746 runtime PR 73469 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925625 runtime PR 73449 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925595 runtime PR 73318 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925578 runtime PR 73380 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925421 runtime PR 73416 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925417 runtime PR 73416 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925374 runtime PR 73373 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925213 runtime PR 71705 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925177 runtime PR 73451 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925173 runtime PR 73277 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925006 runtime PR 73435 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925005 runtime Rolling coreclr Linux arm Checked @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1925005 runtime Rolling coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924937 runtime PR 73434 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924869 runtime PR 73363 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924858 runtime PR 73424 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924797 runtime PR 73359 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924767 runtime PR 73428 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924758 runtime PR 73427 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924745 runtime PR 73320 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924726 runtime PR 73426 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924707 runtime PR 73400 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924659 runtime PR 73423 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924609 runtime PR 73363 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924484 runtime PR 73186 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924474 runtime PR 73283 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924444 runtime PR 73365 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924411 runtime PR 73216 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924347 runtime PR 73416 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924260 runtime PR 73399 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924253 runtime PR 66743 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924198 runtime PR 73408 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924181 runtime PR 73406 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924142 runtime PR 72961 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924138 runtime PR 73009 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924099 runtime PR 73404 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924029 runtime PR 68629 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1924017 runtime PR 71203 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1923945 runtime PR 73174 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1923937 runtime PR 73073 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1923921 runtime PR 73318 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1923912 runtime PR 73383 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1923908 runtime Rolling coreclr Linux arm Checked @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1923908 runtime Rolling coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1923806 runtime PR 73397 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1920967 runtime PR 73032 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py
1916590 runtime PR 73032 coreclr Linux arm Checked no_tiered_compilation @ (Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440 console.log core dump runclient.py

Build Result Summary

Day Hit Count Week Hit Count Month Hit Count
39 57 57

runfoapp[bot] avatar Aug 02 '22 17:08 runfoapp[bot]

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details

Runfo Creating Tracking Issue (data being generated)

Author: runfoapp[bot]
Assignees: -
Labels:

area-GC-coreclr

Milestone: -

msftbot[bot] avatar Aug 02 '22 17:08 msftbot[bot]

This has pretty big impact on CI - causing 4 failures per day - @mangod9 @cshung can you please disable it ASAP? Note: First occurrence was in @cshung's PR - see updated top post.

karelz avatar Aug 05 '22 17:08 karelz

@mangod9 given the hit rate, I'll disable this test while it gets investigated

hoyosjs avatar Aug 05 '22 17:08 hoyosjs

PR to disable: https://github.com/dotnet/runtime/pull/73477

hoyosjs avatar Aug 05 '22 17:08 hoyosjs

I have investigated it. The failure is pretty strange. We crash because the current thread's Thread::m_pFrame is 0xffffffff when the NDirectImportWorker attempts to execute the following assert: https://github.com/dotnet/runtime/blob/3238c2a0e94a11d83cbfed0945b07b3d6f717c71/src/coreclr/vm/dllimport.cpp#L5852-L5853

So the GetFrame() returns the 0xffffffff. This is a special value FRAME_TOP. However, at that point, there should be an explicit frame, so it looks like somehow it got popped off.

This is the call stack:

* thread #1, name = 'corerun', stop reason = signal SIGABRT
    frame #0: 0xef390f04 libpthread.so.0`__libc_do_syscall at libc-do-syscall.S:46
    frame #1: 0xef38fcaa libpthread.so.0`__waitpid at waitpid.c:30
    frame #2: 0xef38fc94 libpthread.so.0`__waitpid(pid=2388, stat_loc=0xef3e1bac, options=0)
    frame #3: 0xef01bb94 libcoreclr.so`PROCCreateCrashDump(argv=<unavailable>, errorMessageBuffer=0x00000000, cbErrorMessageBuffer=0) at process.cpp:2459:22
    frame #4: 0xef01c51e libcoreclr.so`::PROCCreateCrashDumpIfEnabled(signal=<unavailable>) at process.cpp:2635:9
    frame #5: 0xeefc8a7e libcoreclr.so`invoke_previous_action(action=0xef09c174, code=11, siginfo=0xef3e1c90, context=0xef3e1d10, signalRestarts=<unavailable>) at signal.cpp:431:5
    frame #6: 0xeefc80e0 libcoreclr.so`sigsegv_handler(code=11, siginfo=0xef3e1c90, context=0xef3e1d10) at signal.cpp:638:5
    frame #7: 0xef0f2760 libc.so.6 at sigrestorer.S:77
    frame #8: 0xeed09674 libcoreclr.so`::NDirectImportWorker(NDirectMethodDesc *) [inlined] InlinedCallFrame::~InlinedCallFrame(this=0xffc7b418) at frames.h:2963:5
  * frame #9: 0xeed09670 libcoreclr.so`::NDirectImportWorker(NDirectMethodDesc *) [inlined] InlinedCallFrame::GetMethodFrameVPtr() at frames.h:2963
    frame #10: 0xeed09670 libcoreclr.so`::NDirectImportWorker(pMD=0xec38d0d4) at dllimport.cpp:5851
    frame #11: 0xeee7d254 libcoreclr.so`NDirectImportThunk at asmhelpers.S:247
    frame #12: 0xed7e0468
    frame #13: 0xeee7d122 libcoreclr.so`CallDescrWorkerInternal at asmhelpers.S:73
    frame #14: 0xeecd0fe0 libcoreclr.so`CallDescrWorker(pCallDescrData=0xffc7ba40) at callhelpers.cpp:120:5
    frame #15: 0xeecd0e94 libcoreclr.so`CallDescrWorkerWithHandler(pCallDescrData=0xffc7ba40, fCriticalCall=NO) at callhelpers.cpp:67:5
    frame #16: 0xeecd16a6 libcoreclr.so`MethodDescCallSite::CallTargetWorker(this=<unavailable>, pArguments=0xffc7bb18, pReturnValue=<unavailable>, cbReturnValue=<unavailable>) at callhelpers.cpp:562:9
    frame #17: 0xeeb9c400 libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) [inlined] MethodDescCallSite::Call_RetArgSlot(this=0xffc7bb4c, pArguments=0xffc7bb18) at callhelpers.h:458:9
    frame #18: 0xeeb9c3c6 libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) at assembly.cpp:1367
    frame #19: 0xeeb9c26a libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) [inlined] RunMain(this=<unavailable>, pParam=<unavailable>)::$_0::operator()(Param*) const::'lambda'(Param*)::operator()(Param*) const at assembly.cpp:1435
    frame #20: 0xeeb9c26a libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) at assembly.cpp:1437
    frame #21: 0xeeb9c212 libcoreclr.so`RunMain(pFD=<unavailable>, numSkipArgs=<unavailable>, piRetVal=<unavailable>, stringArgs=<unavailable>) at assembly.cpp:1437
    frame #22: 0xeeb9c6f8 libcoreclr.so`Assembly::ExecuteMainMethod(this=<unavailable>, stringArgs=0xffc7be9c, waitForOtherThreads=YES) at assembly.cpp:1562:18
    frame #23: 0xeebcd2be libcoreclr.so`CorHost2::ExecuteAssembly(this=<unavailable>, dwAppDomainId=<unavailable>, pwzAssemblyPath=<unavailable>, argc=<unavailable>, argv=0x00000000, pReturnValue=0xffc7bf9c) at corhost.cpp:360:39
    frame #24: 0xeeb88a0c libcoreclr.so`::coreclr_execute_assembly(hostHandle=0x01a1fe88, domainId=1, argc=0, argv=<unavailable>, managedAssemblyPath=0x01a216b0, exitCode=0xffc7bf9c) at exports.cpp:430:24
    frame #25: 0x009567bc corerun`main [inlined] run(config=0xffc7bfc8) at corerun.cpp:368:18
    frame #26: 0x0095590c corerun`main(argc=<unavailable>, argv=<unavailable>) at corerun.cpp:563
    frame #27: 0xef0e3fe6 libc.so.6`__libc_start_main(main=(corerun`main + 1 at corerun.cpp:555), argc=4, argv=0xffc7cbb4, init=<unavailable>, fini=(corerun`__libc_csu_fini + 1), rtld_fini=(ld-linux-armhf.so.3`_dl_fini + 1 at dl-fini.c:50), stack_end=0xffc7cbb4) at libc-start.c:310
    frame #28: 0x00955034 corerun`_start + 52

janvorli avatar Aug 05 '22 22:08 janvorli

The managed frame is obviously the Test_GetGCMemoryInfo.Main(), the method that is being called through the NDirectImportThunk is System.GC._Collect(int, int)

janvorli avatar Aug 05 '22 22:08 janvorli

@janvorli would https://github.com/dotnet/runtime/pull/73491 be enough to fix this, or should I merge the test disable.

hoyosjs avatar Aug 05 '22 23:08 hoyosjs

I don't understand how the #73491 fix would be related to what I am seeing in the dump. But it is possible there are more failure modes.

janvorli avatar Aug 05 '22 23:08 janvorli

I don't understand how the #73491 fix would be related to what I am seeing in the dump. But it is possible there are more failure modes.

The test fails on that #73941 as well, there must be something else.

cshung avatar Aug 06 '22 01:08 cshung

guess we should disable it then and continue to investigate.

mangod9 avatar Aug 06 '22 01:08 mangod9

I added some notes at the top. Originally the test was observed to fail starting 7/21 but I suspect those failures were unrelated. Arm32 failures started more recently on 7/28 and the failure rate skyrockets around 8/3-8/4.

Issue https://github.com/dotnet/runtime/issues/73433 also starts failing on arm32 on the same day, about 5 hours before this one starts failing.

noahfalk avatar Aug 06 '22 04:08 noahfalk

https://github.com/dotnet/runtime/commit/ef5779719e35093f3e4a6bb910ecdaf594ac32b0 is likely the first commit where it started failing in main branch. If reverting that change fixes it we should do it and reopen https://github.com/dotnet/runtime/issues/60343.

am11 avatar Aug 06 '22 08:08 am11

@am11 what lead you to that ibc commit? I see that one was checked in around 4:41 pm PST on the 4th but that seems later than I would expect based on the failure timeline. I think the culprit may be https://github.com/dotnet/runtime/commit/4c07f3dbe5a439e5c6ac66fd14c4fed51aa819ce

This appears to be the timeline...

Sporadic failures early in the week: 7/28 1910538 log 7/29 1912185 log 8/1 1916590 log 8/3 1920967 log

Somewhere around 12:19PM on the 4th, jobs queued after that point have dramatically increased failure rate. All of these jobs failed on the GetGCMemoryInfo test: 1923806 8/4 12:19pm 1923908 8/4 1:00 pm 1923912 8/4 1:03 pm 1923921 8/4 1:08 pm 1923937 8/4 1:10 pm 1923945 8/4 1:11 pm 1924017 8/4 1:45 pm 1924029 8/4 1:50 pm 1924099 8/4 2:20 pm ...

https://github.com/dotnet/runtime/commit/4c07f3dbe5a439e5c6ac66fd14c4fed51aa819ce was commited at 11:54am. The PR for that change did fail the GetGCMemoryInfo test (it was build 1920967 from 8/3 above), but due to the three earlier failures between 7/29-8/1 it may have been disguised as if it was a pre-existing issue.

Can someone else double check this makes sense? (heads up @jkoritzinsky)

noahfalk avatar Aug 06 '22 13:08 noahfalk

@noahfalk, thanks for root causing it. I took a quick glance at the History tab in AzDO and saw the first red bar was pointing to the PR of that commit. I am not sure how reliable that GUI is.

I think it would probably make sense to revert whichever change has caused the failure, so that particular change can be reworked (instead of disabling the test).

am11 avatar Aug 06 '22 13:08 am11

The failure here is not just a linux arm issue. See this (sadly Microsoft internal only) kusto query. On Windows it shows up less frequently and as an assert firing on win-arm (asserts are off on linux-arm). As Noah observed, there's a sharp rise on

image

The first few are different asserts from other PRs that weren't merged largely. The first PR that saw the regression was https://github.com/dotnet/runtime/pull/73032, after that it was the IBC PR that already contained the former PR. Looks like Noah got this one right :)

hoyosjs avatar Aug 08 '22 18:08 hoyosjs

On windows we end up unable to unwind for the assert:

Assert failure(PID 6972 [0x00001b3c], Thread: 2124 [0x084c]): Consistency check failed: AV in clr at this callstack:
------
<no module>! <no symbol> + 0x0 (0x00000000)
-----
.AV on tid=0x84c (2124), cxr=045EE528, exr=045EE6C8
FAILED: false

<no module>! <no symbol> + 0x0 (0x00000000)
    File: D:\a\_work\1\s\src\coreclr\vm\excep.cpp Line: 7183
    Image: D:\h\w\B3F1098A\p\corerun.exe

hoyosjs avatar Aug 08 '22 18:08 hoyosjs

We will evaluate whether the revert resolves this issue.

mangod9 avatar Aug 08 '22 21:08 mangod9

Test disabled in #73477 - not blocking CI anymore. Adjusting labels accordingly.

karelz avatar Aug 09 '22 12:08 karelz

@hoyosjs guess we should re-enable again since pinvoke inlining PR is reverted?

mangod9 avatar Aug 09 '22 16:08 mangod9

Noah already enabled again in PR #73595

hoyosjs avatar Aug 09 '22 16:08 hoyosjs

ok nice, so good to close this issue then?

mangod9 avatar Aug 09 '22 16:08 mangod9

I linked the PR for the reenable so that once merged this just gets closed.

hoyosjs avatar Aug 09 '22 17:08 hoyosjs

@noahfalk can you please confirm that you believe / verified that PR #73032 by @jkoritzinsky caused this regression and made tests fail? You reverted that in PR #73551 and re-enabled the test tracked in this issue in PR #73595.

Did the PR cause other problems in the related issues on arm32? (aka is it the GC Hole @jkotas asked you to track down?)

(sorry, there is too many links and comments I understand only half-way, so trying to clarify it ...)

karelz avatar Aug 10 '22 10:08 karelz

@noahfalk can you please confirm that you believe / verified that PR https://github.com/dotnet/runtime/pull/73032 by @jkoritzinsky caused this regression and made tests fail?

Yes, there was a large increase in failure rate shortly after 73032 was checked in and from the looks of it there have been no failures in the last 18 hours where the test was re-enabled but 73032 isn't present. 73032 also failed the GetGCMemoryInfo test in the initial PR when it was merged and passed in the change that reverted it. I did not do any direct analysis of the code change in 73032, all of the conclusions are based on test results with and without 73032 present.

Did the PR cause other problems in the related issues on arm32?

@hoyosjs pointed out some other failures he found that were correlated with the change.

(aka is it the GC Hole @jkotas asked you to track down?)

Yes I treat 73032 as being that issue. Jan also referenced some ARM failures which predated last week and 73032 wasn't checked in at the time those occurred so it couldn't have caused them. I did not investigate them independently once they appeared to be disconnected from the major failure source.

sorry, there is too many links and comments I understand only half-way, so trying to clarify it ...

No worries, hope this helps clear it up a bit : )

noahfalk avatar Aug 10 '22 12:08 noahfalk