runtime icon indicating copy to clipboard operation
runtime copied to clipboard

Test failure baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh

Open VincentBu opened this issue 2 years ago • 7 comments

Run: runtime-coreclr gcstress-extra 20220723.1

Failed test:

coreclr Linux x64 Checked gcstress0xc_jitminopts_heapverify1 @ Ubuntu.1804.Amd64.Open

- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh

coreclr OSX arm64 Checked gcstress0xc_jitminopts_heapverify1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Arrays/lcs/lcs2_d/lcs2_d.cmd

coreclr OSX arm64 Checked gcstress0xc_jitstress1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Arrays/lcs/lcs2_r/lcs2_r.cmd

coreclr OSX arm64 Checked gcstress0xc_zapdisable_heapverify1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Boxing/boxunbox/KeepAliveBoxOpt/KeepAliveBoxOpt.cmd

Error message:

Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])

cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)

Return code:      -100
Raw output file:      /datadisks/disk1/work/AD4109E9/w/9F5508E7/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/datadisks/disk1/work/AD4109E9/p/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 14588 corerun
[createdump] Crashing thread 000038fc signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.14588.dmp
[createdump] Written 219791360 bytes (53660 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 1438ms

cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)
Test Harness Exitcode is : -100
To run the test:

set CORE_ROOT=/datadisks/disk1/work/AD4109E9/p
/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual:   False


Stack trace
   at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Queued OS Arch Column1
2022-07-24T00:09:19.457Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
2022-07-17T14:50:14.532Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-07-10T11:00:50.23Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_jitminopts_heapverify1
2022-06-26T05:34:58.745Z osx.1200.arm64.open arm64 runtime-coreclr r2r-extra Checked-gcstress0xc
2022-06-19T14:53:47.255Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-18T23:07:33.226Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_zapdisable_heapverify1
2022-05-29T14:55:06.304Z ubuntu.1804.armarch.open arm runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-12T08:37:37.57Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_zapdisable_jitstress2
2022-05-29T00:10:21.895Z osx.1200.arm64.open arm64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1

VincentBu avatar Jul 25 '22 02:07 VincentBu

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

Failed again in: runtime-coreclr gcstress-extra 20220731.1

Failed Test:

coreclr Linux arm64 Checked gcstress0xc_jitstress2 @ (Ubuntu.1804.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm64v8-20210531091519-97d8652

- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh

Error message:

Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])
apply_reg_state: ip and cfa unchanged; stopping here (ip=0x7f7b5e5b68)

cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)

Return code:      -100
Raw output file:      /root/helix/work/workitem/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 1758 corerun
[createdump] Crashing thread 000006de signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.1758.dmp
[createdump] Written 219217920 bytes (53520 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 3569ms

cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)
Test Harness Exitcode is : -100
To run the test:

set CORE_ROOT=/root/helix/work/correlation
/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual:   False


Stack trace
   at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)

VincentBu avatar Aug 02 '22 01:08 VincentBu

Queued OS Arch Column1
2022-08-01T10:39:41.436Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_jitminopts_heapverify1
2022-08-01T00:31:13.607Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitstress2
2022-07-24T00:09:19.457Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
2022-07-17T14:50:14.532Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-07-10T11:00:50.23Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_jitminopts_heapverify1
2022-06-26T05:34:58.745Z osx.1200.arm64.open arm64 runtime-coreclr r2r-extra Checked-gcstress0xc
2022-06-19T14:53:47.255Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-18T23:07:33.226Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_zapdisable_heapverify1

VincentBu avatar Aug 02 '22 01:08 VincentBu

@mikem8361 This test intentionally crashes and then it seems to hang on the close(parent_pipe); line in the PROCCreateCrashDump function. That line was added in #69663. Could you please take a look?

libpthread_2_27+0x11d5c
libcoreclr!PROCCreateCrashDump+0x338 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2460] 
libcoreclr!PROCCreateCrashDumpIfEnabled+0x5d0 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2637] 
libcoreclr!PROCAbort+0x20 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2664] 
libcoreclr!PROCEndProcess+0x304 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 1346] 
libcoreclr!TerminateProcess+0x90 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 1241] 
libcoreclr!SafeExitProcess+0x180
libcoreclr!EEPolicy::HandleFatalError+0x190 [/__w/1/s/src/coreclr/vm/eepolicy.cpp @ 797] 
libcoreclr!SystemNative::GenericFailFast+0x334 [/__w/1/s/src/coreclr/classlibnative/bcltype/system.cpp @ 291] 
libcoreclr!SystemNative::FailFast+0x10c [/__w/1/s/src/coreclr/classlibnative/bcltype/system.cpp @ 310] 

AntonLapounov avatar Aug 07 '22 22:08 AntonLapounov

It looks like it is blocked in the waitpid at line 2459. We are waiting for createdump to finish generating the core dump and exit. Is there a createdump process still around? Maybe createdump is taking too long to finish. Is the crashing process large?

2458 int wstatus = 0; 2459 int result = waitpid(childpid, &wstatus, 0); 2460 if (result != childpid) 2461 {

libpthread_2_27+0x11d5c

libcoreclr!PROCCreateCrashDump+0x338 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2460]

libcoreclr!PROCCreateCrashDumpIfEnabled+0x5d0 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2637]

mikem8361 avatar Aug 07 '22 22:08 mikem8361

Sorry, I was looking at the PR diff for the line numbers not the current process.cpp source. You are right that line 2460 in the current source is the close(parent_pipe). I need to investigate why close on this pipe hangs.

mikem8361 avatar Aug 07 '22 22:08 mikem8361

I think you are actually correct, because those line numbers are for the 20220731.1 build. The crashing process should be small as we run this test individually.

AntonLapounov avatar Aug 07 '22 23:08 AntonLapounov

If the stack trace is from a core dump generated by createdump, the crashing thread will be waiting at that point on the waitpid.

Isn't the failure from whatever called the FailFast? I don't think this issue is createdump related.

mikem8361 avatar Aug 08 '22 17:08 mikem8361

This test intentionally calls Environment.FailFast: https://github.com/dotnet/runtime/blob/9865cc7a89a4a613a47986b8268a10bc37adbd65/src/tests/baseservices/exceptions/simple/ParallelCrash.cs#L37 The question is why the test hangs indefinitely after that call:

cmdLine:/root/helix/work/workitem/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 8/1/2022 12:40:15 AM, end: 8/1/2022 2:10:15 AM)

#69663 was merged on May 26th. According to the table above, the test started to time out on May 29th (2022-05-29). Could that be related?

AntonLapounov avatar Aug 08 '22 18:08 AntonLapounov

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

Issue Details

Run: runtime-coreclr gcstress-extra 20220723.1

Failed test:

coreclr Linux x64 Checked gcstress0xc_jitminopts_heapverify1 @ Ubuntu.1804.Amd64.Open

- baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh

coreclr OSX arm64 Checked gcstress0xc_jitminopts_heapverify1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Arrays/lcs/lcs2_d/lcs2_d.cmd

coreclr OSX arm64 Checked gcstress0xc_jitstress1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Arrays/lcs/lcs2_r/lcs2_r.cmd

coreclr OSX arm64 Checked gcstress0xc_zapdisable_heapverify1 @ OSX.1200.ARM64.Open

- JIT/Methodical/Boxing/boxunbox/KeepAliveBoxOpt/KeepAliveBoxOpt.cmd

Error message:

Process terminated. Parallel crash in main thread
at System.Environment.FailFast(System.String)
at ParallelCrash.Main(System.String[])

cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)

Return code:      -100
Raw output file:      /datadisks/disk1/work/AD4109E9/w/9F5508E7/uploads/Reports/baseservices.exceptions/simple/ParallelCrash/ParallelCrash.output.txt
Raw output:
BEGIN EXECUTION
/datadisks/disk1/work/AD4109E9/p/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false ParallelCrash.dll ''
[createdump] Gathering state for process 14588 corerun
[createdump] Crashing thread 000038fc signal 00000006
[createdump] Writing minidump with heap to file /home/helixbot/dotnetbuild/dumps/coredump.14588.dmp
[createdump] Written 219791360 bytes (53660 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 1438ms

cmdLine:/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh Timed Out (timeout in milliseconds: 5400000 from variable __TestTimeout, start: 7/24/2022 12:39:10 AM, end: 7/24/2022 2:09:10 AM)
Test Harness Exitcode is : -100
To run the test:

set CORE_ROOT=/datadisks/disk1/work/AD4109E9/p
/datadisks/disk1/work/AD4109E9/w/9F5508E7/e/baseservices/exceptions/simple/ParallelCrash/ParallelCrash.sh
Expected: True
Actual:   False


Stack trace
   at baseservices_exceptions._simple_ParallelCra_ParallelCra_._simple_ParallelCrash_ParallelCrash_sh()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Queued OS Arch Column1
2022-07-24T00:09:19.457Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
2022-07-17T14:50:14.532Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-07-10T11:00:50.23Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_jitminopts_heapverify1
2022-06-26T05:34:58.745Z osx.1200.arm64.open arm64 runtime-coreclr r2r-extra Checked-gcstress0xc
2022-06-19T14:53:47.255Z ubuntu.1804.armarch.open arm64 runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-18T23:07:33.226Z ubuntu.1804.amd64.open.rt x64 runtime-coreclr gcstress-extra Checked-gcstress0xc_zapdisable_heapverify1
2022-05-29T14:55:06.304Z ubuntu.1804.armarch.open arm runtime-coreclr gcstress0x3-gcstress0xc Checked-gcstress0xc
2022-06-12T08:37:37.57Z osx.1200.arm64.open arm64 runtime-coreclr crossgen2-composite gcstress Checked-gcstress0xc_zapdisable_jitstress2
2022-05-29T00:10:21.895Z osx.1200.arm64.open arm64 runtime-coreclr gcstress-extra Checked-gcstress0xc_jitminopts_heapverify1
Author: VincentBu
Assignees: mikem8361
Labels:

arch-arm64, os-linux, os-mac-os-x, GCStress, arch-x64, area-Diagnostics-coreclr, area-ExceptionHandling-coreclr, blocking-clean-ci-optional

Milestone: 7.0.0

msftbot[bot] avatar Aug 08 '22 21:08 msftbot[bot]

Mike, I see this in an old log:

Process terminated. Parallel crash in main thread
   at System.Environment.FailFast(System.String)
   at ParallelCrash.Main(System.String[])
apply_reg_state: ip and cfa unchanged; stopping here (ip=0x7fa6a5a538)

Looks like it comes from libunwind.

hoyosjs avatar Aug 09 '22 19:08 hoyosjs

How can I see more recent failures (with a runtime with my new logging) with this test?

mikem8361 avatar Aug 12 '22 17:08 mikem8361

How can I see more recent failures (with a runtime with my new logging) with this test?

Here's a runfo query that should provide the latest failures: https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E7+definition%3A662+name%3AParallelCrash.sh. You can change the timeframe by altering "started" to filter within a specific date range.

tommcdon avatar Aug 12 '22 17:08 tommcdon

According to kusto, last failure was 2022-08-07T08:53:38.06Z, so nothing since your PR went in.

hoyosjs avatar Aug 12 '22 17:08 hoyosjs

Very likely a duplicate of #57621. Please reopen if you hit this again.

AntonLapounov avatar Aug 13 '22 00:08 AntonLapounov

Reopening, I did not check dates properly.

AntonLapounov avatar Aug 13 '22 00:08 AntonLapounov

This hasn't reproed since 8/7; moving to future until we have something to make it actionable.

hoyosjs avatar Aug 15 '22 22:08 hoyosjs

ParallelCrash - no failures in the last 3 weeks, closing.

tommcdon avatar Aug 30 '22 15:08 tommcdon