runtime
runtime copied to clipboard
Test failure baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Run: runtime-coreclr gcstress-extra 20220723.1
Failed test:
coreclr Linux x64 Checked gcstress0xc_jitminopts_heapverify1 @ Ubuntu.1804.Amd64.Open
- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
coreclr OSX arm64 Checked gcstress0xc_jitminopts_heapverify1 @ OSX.1200.ARM64.Open
- JIT/Methodical/Arrays/lcs/lcs2_d/lcs2_d.cmd
coreclr OSX arm64 Checked gcstress0xc_jitstress1 @ OSX.1200.ARM64.Open
- JIT/Methodical/Arrays/lcs/lcs2_r/lcs2_r.cmd
coreclr OSX arm64 Checked gcstress0xc_zapdisable_heapverify1 @ OSX.1200.ARM64.Open
- JIT/Methodical/Boxing/boxunbox/KeepAliveBoxOpt/KeepAliveBoxOpt.cmd
Error message:
Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])
cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)
Return code: -100
Raw output file: /datadisks/disk1/work/AD4109E9/w/9F5508E7/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/datadisks/disk1/work/AD4109E9/p/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 14588 corerun
[createdump] Crashing thread 000038fc signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.14588.dmp
[createdump] Written 219791360 bytes (53660 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 1438ms
cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)
Test Harness Exitcode is : -100
To run the test:
set CORE_ROOT=/datadisks/disk1/work/AD4109E9/p
/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual: False
Stack trace
at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Failed again in: runtime-coreclr gcstress-extra 20220731.1
Failed Test:
coreclr Linux arm64 Checked gcstress0xc_jitstress2 @ (Ubuntu.1804.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm64v8-20210531091519-97d8652
- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Error message:
Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])
apply_reg_state: ip and cfa unchanged; stopping here (ip=0x7f7b5e5b68)
cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)
Return code: -100
Raw output file: /root/helix/work/workitem/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 1758 corerun
[createdump] Crashing thread 000006de signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.1758.dmp
[createdump] Written 219217920 bytes (53520 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 3569ms
cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)
Test Harness Exitcode is : -100
To run the test:
set CORE_ROOT=/root/helix/work/correlation
/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual: False
Stack trace
at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
@mikem8361 This test intentionally crashes and then it seems to hang on the close(parent_pipe);
line in the PROCCreateCrashDump
function. That line was added in #69663. Could you please take a look?
libpthread_2_27+0x11d5c
libcoreclr!PROCCreateCrashDump+0x338 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2460]
libcoreclr!PROCCreateCrashDumpIfEnabled+0x5d0 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2637]
libcoreclr!PROCAbort+0x20 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2664]
libcoreclr!PROCEndProcess+0x304 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 1346]
libcoreclr!TerminateProcess+0x90 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 1241]
libcoreclr!SafeExitProcess+0x180
libcoreclr!EEPolicy::HandleFatalError+0x190 [/__w/1/s/src/coreclr/vm/eepolicy.cpp @ 797]
libcoreclr!SystemNative::GenericFailFast+0x334 [/__w/1/s/src/coreclr/classlibnative/bcltype/system.cpp @ 291]
libcoreclr!SystemNative::FailFast+0x10c [/__w/1/s/src/coreclr/classlibnative/bcltype/system.cpp @ 310]
It looks like it is blocked in the waitpid at line 2459. We are waiting for createdump to finish generating the core dump and exit. Is there a createdump process still around? Maybe createdump is taking too long to finish. Is the crashing process large?
2458 int wstatus = 0; 2459 int result = waitpid(childpid, &wstatus, 0); 2460 if (result != childpid) 2461 {
libpthread_2_27+0x11d5c
libcoreclr!PROCCreateCrashDump+0x338 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2460]
libcoreclr!PROCCreateCrashDumpIfEnabled+0x5d0 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2637]
Sorry, I was looking at the PR diff for the line numbers not the current process.cpp source. You are right that line 2460 in the current source is the close(parent_pipe). I need to investigate why close on this pipe hangs.
I think you are actually correct, because those line numbers are for the 20220731.1 build. The crashing process should be small as we run this test individually.
If the stack trace is from a core dump generated by createdump, the crashing thread will be waiting at that point on the waitpid.
Isn't the failure from whatever called the FailFast? I don't think this issue is createdump related.
This test intentionally calls Environment.FailFast
: https://github.com/dotnet/runtime/blob/9865cc7a89a4a613a47986b8268a10bc37adbd65/src/tests/baseservices/exceptions/simple/ParallelCrash.cs#L37
The question is why the test hangs indefinitely after that call:
cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)
#69663 was merged on May 26th. According to the table above, the test started to time out on May 29th (2022-05-29
). Could that be related?
Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.
Issue Details
Run: runtime-coreclr gcstress-extra 20220723.1
Failed test:
coreclr Linux x64 Checked gcstress0xc_jitminopts_heapverify1 @ Ubuntu.1804.Amd64.Open
- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
coreclr OSX arm64 Checked gcstress0xc_jitminopts_heapverify1 @ OSX.1200.ARM64.Open
- JIT/Methodical/Arrays/lcs/lcs2_d/lcs2_d.cmd
coreclr OSX arm64 Checked gcstress0xc_jitstress1 @ OSX.1200.ARM64.Open
- JIT/Methodical/Arrays/lcs/lcs2_r/lcs2_r.cmd
coreclr OSX arm64 Checked gcstress0xc_zapdisable_heapverify1 @ OSX.1200.ARM64.Open
- JIT/Methodical/Boxing/boxunbox/KeepAliveBoxOpt/KeepAliveBoxOpt.cmd
Error message:
Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])
cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)
Return code: -100
Raw output file: /datadisks/disk1/work/AD4109E9/w/9F5508E7/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/datadisks/disk1/work/AD4109E9/p/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 14588 corerun
[createdump] Crashing thread 000038fc signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.14588.dmp
[createdump] Written 219791360 bytes (53660 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 1438ms
cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)
Test Harness Exitcode is : -100
To run the test:
set CORE_ROOT=/datadisks/disk1/work/AD4109E9/p
/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual: False
Stack trace
at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Author: | VincentBu |
---|---|
Assignees: | mikem8361 |
Labels: |
|
Milestone: | 7.0.0 |
Mike, I see this in an old log:
Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])
apply_reg_state: ip and cfa unchanged; stopping here (ip=0x7fa6a5a538)
Looks like it comes from libunwind.
How can I see more recent failures (with a runtime with my new logging) with this test?
How can I see more recent failures (with a runtime with my new logging) with this test?
Here's a runfo query that should provide the latest failures: https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E7+definition%3A662+name%3AParallelCrash.sh. You can change the timeframe by altering "started" to filter within a specific date range.
According to kusto, last failure was 2022-08-07T08:53:38.06Z, so nothing since your PR went in.
Very likely a duplicate of #57621. Please reopen if you hit this again.
Reopening, I did not check dates properly.
This hasn't reproed since 8/7; moving to future until we have something to make it actionable.
ParallelCrash - no failures in the last 3 weeks, closing.