openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

DaaLoadTest_daa1 tests - **FAILED** Process DLT has hung

Open JasonFengJ9 opened this issue 4 years ago • 56 comments

Failure link

From an internal build Test_openjdk11_j9_special.system_ppc64le_linux_testList_2/111/ (ubu18le-rt1-4):

[2021-09-09T02:19:34.473Z] java version "11.0.13-beta" 2021-10-19
[2021-09-09T02:19:34.473Z] IBM Semeru Runtime Certified Edition 11.0.12.0 (build 11.0.13-beta+5-202109090107)
[2021-09-09T02:19:34.473Z] Eclipse OpenJ9 VM 11.0.12.0 (build master-92ccf02b2, JRE 11 Linux ppc64le-64-Bit Compressed References 20210908_164 (JIT enabled, AOT enabled)
[2021-09-09T02:19:34.473Z] OpenJ9   - 92ccf02b2
[2021-09-09T02:19:34.473Z] OMR      - 680b68120
[2021-09-09T02:19:34.473Z] JCL      - e851d14114 based on jdk-11.0.13+5)

Re-run the Grinder

Optional info

Failure output (captured from console output)

[2021-09-09T02:19:28.973Z] Running test DaaLoadTest_daa1_special_5m_4 ...
[2021-09-09T02:19:28.973Z] ===============================================
[2021-09-09T02:19:28.973Z] DaaLoadTest_daa1_special_5m_4 Start Time: Wed Sep  8 19:19:27 2021 Epoch Time (ms): 1631153967861
[2021-09-09T02:19:28.974Z] "/home/jenkins/workspace/Test_openjdk11_j9_special.system_ppc64le_linux_testList_2/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk11_j9_special.system_ppc64le_linux_testList_2/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
[2021-09-09T02:19:28.974Z] JVMSHRC005I No shared class caches available
[2021-09-09T02:19:28.974Z] JVMSHRC005I No shared class caches available
[2021-09-09T02:19:28.974Z] cache cleanup done
[2021-09-09T02:19:28.974Z] variation: Mode112
[2021-09-09T02:19:28.974Z] JVM_OPTIONS:  -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs 

[2021-09-09T02:20:04.486Z] DLT 19:20:03.029 - Final test list:
[2021-09-09T02:20:04.486Z] DLT 19:20:03.129 -   0 ArbitraryJava[net.openj9.test.simple.ConvertDecimal invokeTest]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.130 -   1 ArbitraryJava[net.openj9.test.simple.MarshalUnmarshalBinary invokeTest]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.132 -   2 JUnit[net.openj9.test.arithmetics.TestArithmeticOperations]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.132 -   3 JUnit[net.openj9.test.arithmetics.TestArithmeticInline]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.132 -   4 JUnit[net.openj9.test.arithmetics.TestArithmetics]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.133 -   5 JUnit[net.openj9.test.arithmetics.TestComparisonEquals]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.133 -   6 JUnit[net.openj9.test.arithmetics.TestPDComparisons]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.133 -   7 JUnit[net.openj9.test.arithmetics.TestPerformance]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.133 -   8 JUnit[net.openj9.test.arithmetics.TestSubExceptions]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.133 -   9 JUnit[net.openj9.test.arithmetics.TestValidityChecking]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.133 -   10 JUnit[net.openj9.test.binaryData.LongIntegerComparison]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.134 -   11 JUnit[net.openj9.test.binaryData.TestOptimizer]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.134 -   12 JUnit[net.openj9.test.decimals.TestBD2PD2BD]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.134 -   13 JUnit[net.openj9.test.decimals.TestDecimalData]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.134 -   14 JUnit[net.openj9.test.decimals.TestDecimalData2]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.134 -   15 JUnit[net.openj9.test.PD2Primitive.TestPD2Primitives2PD]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.134 -   16 JUnit[net.openj9.test.PDMoveShifts.TestShiftsAndConvert]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.134 -   17 JUnit[net.openj9.test.PDMoveShifts.TestPDMove]  Weighting=1 
[2021-09-09T02:20:04.486Z] DLT 19:20:03.134 -   18 ArbitraryJava[net.openj9.test.PDMoveShifts.ShiftTestRunner invokeTest]  Weighting=1 
[2021-09-09T02:20:06.734Z] DLT 19:20:05.197 - Starting thread. Suite=0 thread=0
[2021-09-09T02:20:06.734Z] DLT 19:20:05.248 - Starting thread. Suite=0 thread=1
[2021-09-09T02:24:45.299Z] STF 19:24:33.408 - Heartbeat: Process DLT is still running
[2021-09-09T02:29:45.506Z] STF 19:29:33.130 - Heartbeat: Process DLT is still running
[2021-09-09T02:34:46.648Z] STF 19:34:33.380 - Heartbeat: Process DLT is still running
[2021-09-09T02:39:33.486Z] DLT 19:39:22.522 - Completed 100.0%. Number of tests started=2
[2021-09-09T02:39:33.486Z] DLT 19:39:22.523 - **POSSIBLE HANG DETECTED**
[2021-09-09T02:39:33.487Z] STF 19:39:22.601 - **FAILED** Process DLT has hung
[2021-09-09T02:39:33.487Z] STF 19:39:22.601 - Collecting dumps for: DLT
[2021-09-09T02:39:33.487Z] STF 19:39:22.601 - Sending SIG 3 to the java process to generate a javacore

[2021-09-09T02:40:55.178Z] STF 19:40:52.949 - =====================   R E S U L T S   =====================
[2021-09-09T02:40:55.178Z] STF 19:40:52.949 - Stage results:
[2021-09-09T02:40:55.178Z] STF 19:40:52.949 -   setUp:     pass
[2021-09-09T02:40:55.178Z] STF 19:40:52.949 -   execute:  *fail*
[2021-09-09T02:40:55.178Z] STF 19:40:52.949 -   teardown:  pass
[2021-09-09T02:40:55.178Z] STF 19:40:52.949 - 
[2021-09-09T02:40:55.178Z] STF 19:40:52.950 - Overall result: **FAILED**
[2021-09-09T02:40:55.178Z] 
[2021-09-09T02:40:55.178Z] DaaLoadTest_daa1_special_5m_4_FAILED

JasonFengJ9 avatar Sep 09 '21 11:09 JasonFengJ9

Also occurred at JDK17 M1 internal build Test_openjdk17_j9_special.system_ppc64le_linux_Personal/30/tapResults/

===============================================
Running test DaaLoadTest_daa1_special_5m_4 ...
===============================================
DaaLoadTest_daa1_special_5m_4 Start Time: Wed Sep  8 20:00:25 2021 Epoch Time (ms): 1631156425302
variation: Mode112
JVM_OPTIONS:  -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs 

openjdk version "17-internal" 2021-09-14
OpenJDK Runtime Environment (build 17-internal+0-adhoc.jenkins.BuildJDK17ppc64lelinuxPersonal)
Eclipse OpenJ9 VM (build v0.28.0-release-139535d5a37, JRE 17 Linux ppc64le-64-Bit Compressed References 20210908_41 (JIT enabled, AOT enabled)
OpenJ9   - 139535d5a37
OMR      - 407eef0153b
JCL      - b6c1a570c77 based on jdk-17+35)

STF 20:15:29.047 - Heartbeat: Process DLT is still running
DLT 20:17:23.958 - Completed 100.0%. Number of tests started=6
DLT 20:17:52.863 - **POSSIBLE HANG DETECTED**
STF 20:17:54.339 - **FAILED** Process DLT has hung
STF 20:17:54.339 - Collecting dumps for: DLT
STF 20:17:54.339 - Sending SIG 3 to the java process to generate a javacore

STF 20:18:24.347 - Monitoring Report Summary:
STF 20:18:24.347 -   o Process DLT has hung
STF 20:18:24.348 - Killing processes: DLT
STF 20:18:24.348 -   o Process DLT pid 2606789 is not running
**FAILED** at step 1 (Run daa load test). Expected return value=0 Actual=1 at /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64le_linux_Personal_testList_2/aqa-tests/TKG/../TKG/output_16311564227809/DaaLoadTest_daa1_special_5m_4/20210908-200025-DaaLoadTest/execute.pl line 95.
STF 20:18:24.414 - **FAILED** execute script failed. Expected return value=0 Actual=1
STF 20:18:24.414 - 
STF 20:18:24.414 - ====================   T E A R D O W N   ====================
STF 20:18:24.414 - Running teardown: perl /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64le_linux_Personal_testList_2/aqa-tests/TKG/../TKG/output_16311564227809/DaaLoadTest_daa1_special_5m_4/20210908-200025-DaaLoadTest/tearDown.pl
STF 20:18:24.546 - TEARDOWN stage completed
STF 20:18:24.551 - 
STF 20:18:24.551 - =====================   R E S U L T S   =====================
STF 20:18:24.551 - Stage results:
STF 20:18:24.551 -   setUp:     pass
STF 20:18:24.551 -   execute:  *fail*
STF 20:18:24.551 -   teardown:  pass
STF 20:18:24.551 - 
STF 20:18:24.551 - Overall result: **FAILED**

DaaLoadTest_daa1_special_5m_4_FAILED

Since this occurred at JDK17, move it to 0.28.

JasonFengJ9 avatar Sep 09 '21 12:09 JasonFengJ9

Observed in JDK17 0.28 release build at job/Test_openjdk17_j9_special.system_ppc64le_linux_testList_2/55/

openjdk version "17-beta" 2021-09-14
IBM Semeru Runtime Open Edition 17.0.0.0-m1 (build 17-beta+35-202109100228)
Eclipse OpenJ9 VM 17.0.0.0-m1 (build openj9-0.28.0-m1, JRE 17 Linux ppc64le-64-Bit Compressed References 20210909_25 (JIT enabled, AOT enabled)
OpenJ9   - 5eef84fb1
OMR      - 0bf455f83
JCL      - b6c1a570c77 based on jdk-17+35)

[2021-09-10T04:04:43.907Z] STF 21:04:38.170 - Heartbeat: Process DLT is still running
[2021-09-10T04:06:13.501Z] DLT 21:06:04.490 - Completed 100.0%. Number of tests started=6
[2021-09-10T04:06:13.501Z] DLT 21:06:04.490 - **POSSIBLE HANG DETECTED**
[2021-09-10T04:06:13.501Z] DLT 21:06:04.620 - Completed 100.0%. Number of tests started=6 (+0)
[2021-09-10T04:06:13.501Z] DLT 21:06:04.720 - Completed 100.0%. Number of tests started=6 (+0)
[2021-09-10T04:06:13.501Z] DLT 21:06:04.821 - Completed 100.0%. Number of tests started=6 (+0)
[2021-09-10T04:06:13.501Z] STF 21:06:04.871 - **FAILED** Process DLT has hung

[2021-09-10T04:07:07.844Z] STF 21:07:04.886 - Monitoring Report Summary:
[2021-09-10T04:07:07.844Z] STF 21:07:04.886 -   o Process DLT has hung
[2021-09-10T04:07:07.844Z] STF 21:07:04.887 - Killing processes: DLT
[2021-09-10T04:07:07.844Z] STF 21:07:04.887 -   o Process DLT pid 24342 is not running
[2021-09-10T04:07:07.844Z] **FAILED** at step 1 (Run daa load test). Expected return value=0 Actual=1 at /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64le_linux_testList_2/aqa-tests/TKG/../TKG/output_16312451709116/DaaLoadTest_daa1_special_5m_4/20210909-203933-DaaLoadTest/execute.pl line 95.
[2021-09-10T04:07:07.844Z] STF 21:07:05.014 - **FAILED** execute script failed. Expected return value=0 Actual=1
[2021-09-10T04:07:07.844Z] STF 21:07:05.014 - 
[2021-09-10T04:07:07.844Z] STF 21:07:05.014 - ====================   T E A R D O W N   ====================
[2021-09-10T04:07:07.844Z] STF 21:07:05.014 - Running teardown: perl /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64le_linux_testList_2/aqa-tests/TKG/../TKG/output_16312451709116/DaaLoadTest_daa1_special_5m_4/20210909-203933-DaaLoadTest/tearDown.pl
[2021-09-10T04:07:07.844Z] STF 21:07:05.138 - TEARDOWN stage completed
[2021-09-10T04:07:07.844Z] STF 21:07:05.143 - 
[2021-09-10T04:07:07.844Z] STF 21:07:05.143 - =====================   R E S U L T S   =====================
[2021-09-10T04:07:07.844Z] STF 21:07:05.143 - Stage results:
[2021-09-10T04:07:07.844Z] STF 21:07:05.143 -   setUp:     pass
[2021-09-10T04:07:07.844Z] STF 21:07:05.143 -   execute:  *fail*
[2021-09-10T04:07:07.844Z] STF 21:07:05.143 -   teardown:  pass
[2021-09-10T04:07:07.844Z] STF 21:07:05.143 - 
[2021-09-10T04:07:07.844Z] STF 21:07:05.143 - Overall result: **FAILED**
[2021-09-10T04:07:07.844Z] 
[2021-09-10T04:07:07.844Z] DaaLoadTest_daa1_special_5m_4_FAILED

Also observed at JDK11 job/Test_openjdk11_j9_special.system_ppc64le_linux_testList_4/112/

java version "11.0.13-beta" 2021-10-19
IBM Semeru Runtime Certified Edition 11.0.12.0 (build 11.0.13-beta+5-202109100106)
Eclipse OpenJ9 VM 11.0.12.0 (build master-a2a4729e4, JRE 11 Linux ppc64le-64-Bit Compressed References 20210909_165 (JIT enabled, AOT enabled)
OpenJ9   - a2a4729e4
OMR      - 680b68120
JCL      - e851d14114 based on jdk-11.0.13+5)

[2021-09-10T02:02:48.927Z] Running test DaaLoadTest_daa1_special_5m_1 ...
[2021-09-10T02:02:48.927Z] ===============================================

[2021-09-10T02:02:48.927Z] variation: Mode103
[2021-09-10T02:02:48.927Z] JVM_OPTIONS:  -Xgcpolicy:optthruput -Xjit:count=0 -Xnocompressedrefs 

[2021-09-10T02:23:04.159Z] STF 19:22:52.093 - Heartbeat: Process DLT is still running
[2021-09-10T02:26:08.468Z] DLT 19:26:02.258 - Completed 100.0%. Number of tests started=4
[2021-09-10T02:26:08.468Z] DLT 19:26:05.636 - **POSSIBLE HANG DETECTED**
[2021-09-10T02:26:08.468Z] STF 19:26:05.939 - **FAILED** Process DLT has hung
[2021-09-10T02:26:08.468Z] STF 19:26:05.939 - Collecting dumps for: DLT
[2021-09-10T02:26:08.468Z] STF 19:26:05.939 - Sending SIG 3 to the java process to generate a javacore

[2021-09-10T02:28:09.205Z] DaaLoadTest_daa1_special_5m_1_FAILED

Rerun in Grinder

job/Test_openjdk11_j9_special.system_ppc64le_linux_testList_1/113/

[2021-09-10T02:03:20.432Z] Running test DaaLoadTest_daa1_special_5m_6 ...
[2021-09-10T02:03:20.432Z] ===============================================

[2021-09-10T02:03:21.374Z] variation: Mode121
[2021-09-10T02:03:21.374Z] JVM_OPTIONS:  -Xgcpolicy:optavgpause -Xjit:count=0 -Xnocompressedrefs 

[2021-09-10T02:23:25.415Z] STF 19:23:24.088 - Heartbeat: Process DLT is still running
[2021-09-10T02:23:46.835Z] DLT 19:23:44.197 - Completed 100.0%. Number of tests started=3
[2021-09-10T02:23:46.835Z] DLT 19:23:44.197 - **POSSIBLE HANG DETECTED**
[2021-09-10T02:23:46.835Z] DLT 19:23:44.298 - Completed 100.0%. Number of tests started=3 (+0)
[2021-09-10T02:23:46.835Z] DLT 19:23:44.399 - Completed 100.0%. Number of tests started=3 (+0)
[2021-09-10T02:23:46.835Z] DLT 19:23:44.499 - Completed 100.0%. Number of tests started=3 (+0)
[2021-09-10T02:23:46.835Z] DLT 19:23:44.599 - Completed 100.0%. Number of tests started=3 (+0)
[2021-09-10T02:23:46.835Z] STF 19:23:44.635 - **FAILED** Process DLT has hung
[2021-09-10T02:23:46.835Z] STF 19:23:44.635 - Collecting dumps for: DLT
[2021-09-10T02:23:46.835Z] STF 19:23:44.635 - Sending SIG 3 to the java process to generate a javacore

[2021-09-10T02:24:45.337Z] DaaLoadTest_daa1_special_5m_6_FAILED

JasonFengJ9 avatar Sep 10 '21 11:09 JasonFengJ9

Observed at JDK11 aarch64_linux job/Test_openjdk11_j9_special.system_aarch64_linux_testList_3/73/

[2021-09-14T02:05:40.446Z] Running test DaaLoadTest_daa1_special_5m_23 ...
[2021-09-14T02:05:40.446Z] ===============================================
[2021-09-14T02:05:40.446Z] DaaLoadTest_daa1_special_5m_23 Start Time: Mon Sep 13 22:05:39 2021 Epoch Time (ms): 1631585139681
[2021-09-14T02:05:40.446Z] "/home/jenkins/workspace/Test_openjdk11_j9_special.system_aarch64_linux_testList_3/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk11_j9_special.system_aarch64_linux_testList_3/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
[2021-09-14T02:05:40.446Z] JVMSHRC005I No shared class caches available
[2021-09-14T02:05:40.446Z] JVMSHRC005I No shared class caches available
[2021-09-14T02:05:40.446Z] cache cleanup done
[2021-09-14T02:05:40.446Z] variation: Mode688
[2021-09-14T02:05:40.446Z] JVM_OPTIONS:  -Xcompressedrefs -Xjit:count=0 -Xgcpolicy:gencon -Xaggressive -Xconcurrentlevel0 

[2021-09-14T02:05:44.103Z] openjdk version "11.0.13-ea" 2021-10-19
[2021-09-14T02:05:44.103Z] IBM Semeru Runtime Open Edition 11.0.12.0 (build 11.0.13-ea+5)
[2021-09-14T02:05:44.103Z] Eclipse OpenJ9 VM 11.0.12.0 (build master-87c301e9d, JRE 11 Linux aarch64-64-Bit Compressed References 20210914_200 (JIT enabled, AOT enabled)
[2021-09-14T02:05:44.103Z] OpenJ9   - 87c301e9d
[2021-09-14T02:05:44.103Z] OMR      - 17c439ee7
[2021-09-14T02:05:44.103Z] JCL      - ddc29ca760 based on jdk-11.0.13+5)

[2021-09-14T02:25:56.527Z] STF 22:25:43.042 - Heartbeat: Process DLT is still running
[2021-09-14T02:30:19.709Z] DLT 22:30:17.223 - Completed 100.0%. Number of tests started=10
[2021-09-14T02:30:24.190Z] DLT 22:30:22.762 - **POSSIBLE HANG DETECTED**
[2021-09-14T02:30:24.190Z] DLT 22:30:22.764 - Thread completed. Suite=0 thread=5
[2021-09-14T02:30:24.190Z] STF 22:30:23.383 - **FAILED** Process DLT has hung

[2021-09-14T02:31:56.113Z] STF 22:31:53.613 - Overall result: **FAILED**
[2021-09-14T02:31:56.113Z] 
[2021-09-14T02:31:56.113Z] DaaLoadTest_daa1_special_5m_23_FAILED

JasonFengJ9 avatar Sep 14 '21 21:09 JasonFengJ9

The message "Completed 100.0%. Number of tests started=10" in the console output suggests this might be a test issue.

As per STF Load Test Readme [1], Load test outputs periodic status reports. As well as showing progress these also prevent any outer test harness which treats output not being written to stdout as a hung test from doing so erroneously. Load test outputs the string 'POSSIBLE HANG DETECTED' if no tests have completed within the previous 15 minutes. Before the process is terminated the diagnostics capture sequence described in 'java dumps' is triggered. If the test is being run on an IBM JVM this should result in 3 sets of java dumps, a core file, snap and jitdump files being created.

It appears that 100% tests have been completed and then determined hung after 15min, which doesn't seem a VM problem. In addition, only javacore files were generated but no system cores.

@Mesbah-Alam @llxia any insights?

I think this is not a 0.28 stop-ship issue, moving to next release instead.

[1] https://github.com/adoptium/STF/blob/master/stf.load/docs/README.md

JasonFengJ9 avatar Sep 15 '21 15:09 JasonFengJ9

@Mesbah-Alam could you take a look into this? Thanks

llxia avatar Sep 27 '21 20:09 llxia

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/8 - aix71-p8-6 DaaLoadTest_daa1_special_5m_20 -Xcompressedrefs -Xgcpolicy:gencon -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xmn512k -Xcheck:gc:vmthreads:all:quiet

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/8/system_test_output.tar.gz

DLT 02:57:32.806 - Completed 47.4%. Number of tests started=111 (+11)
DLT 02:57:52.805 - Completed 54.1%. Number of tests started=116 (+5)
STF 02:59:48.043 - Heartbeat: Process DLT is still running
STF 03:04:48.230 - Heartbeat: Process DLT is still running
STF 03:09:48.143 - Heartbeat: Process DLT is still running
STF 03:14:48.392 - Heartbeat: Process DLT is still running
STF 03:19:48.229 - Heartbeat: Process DLT is still running
DLT 03:23:16.148 - Completed 100.0%. Number of tests started=119 (+3)
DLT 03:23:16.148 - **POSSIBLE HANG DETECTED**
DLT 03:23:16.283 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:16.403 - Completed 100.0%. Number of tests started=119 (+0)
STF 03:23:16.415 - **FAILED** Process DLT has hung
STF 03:23:16.415 - Collecting dumps for: DLT
STF 03:23:16.416 - Sending SIG 3 to the java process to generate a javacore
STF 03:23:16.418 - Running command: kill -3 19726554
STF 03:23:16.418 - Redirecting stderr to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_3.stderr
STF 03:23:16.418 - Redirecting stdout to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_3.stdout
STF 03:23:16.452 - Pausing for 30 seconds
STF 03:23:46.454 - Sending SIG 3 to the java process to generate a javacore
STF 03:23:46.465 - Running command: kill -3 19726554
STF 03:23:46.467 - Redirecting stderr to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_3.stderr
STF 03:23:46.467 - Redirecting stdout to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_3.stdout
STF 03:23:46.495 - Pausing for 30 seconds
STF 03:24:16.497 - Sending SIG 3 to the java process to generate a javacore
STF 03:24:16.511 - Running command: kill -3 19726554
STF 03:24:16.511 - Redirecting stderr to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_3.stderr
STF 03:24:16.511 - Redirecting stdout to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_3.stdout
STF 03:24:16.543 - Pausing for 30 seconds
STF 03:24:46.544 - Sending SIGABRT (kill -6) to the java process to generate a core
STF 03:24:46.549 - Running command: kill -6 19726554
STF 03:24:46.550 - Redirecting stderr to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_6.stderr
STF 03:24:46.550 - Redirecting stdout to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_6.stdout
STF 03:24:46.576 - Pausing for 30 seconds
STF 03:25:16.578 - Sending SIGXCPU (kill -24) to the java process to generate an OS dump
STF 03:25:16.579 - Running command: kill -24 19726554
STF 03:25:16.579 - Redirecting stderr to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_24.stderr
STF 03:25:16.579 - Redirecting stdout to /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_2/aqa-tests/TKG/output_16344511419689/DaaLoadTest_daa1_special_5m_20/20211017-025427-DaaLoadTest/results/1.DLT.kill_24.stdout
DLT 03:23:16.788 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:16.943 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.069 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.173 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.275 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.376 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.488 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.601 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.713 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.827 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:17.938 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.039 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.148 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.250 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.353 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.456 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.563 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.688 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.790 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:18.895 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.017 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.130 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.234 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.340 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.441 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.543 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.648 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.813 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:19.918 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:20.021 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:20.365 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:20.468 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:20.576 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:20.683 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:20.786 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:20.889 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.023 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.131 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.242 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.360 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.491 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.600 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.709 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.824 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:21.925 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.029 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.137 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.239 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.347 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.455 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.561 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.671 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.773 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.878 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:22.720 - Thread completed. Suite=0 thread=0
DLT 03:23:22.979 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.087 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.223 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.333 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.446 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.555 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.663 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.764 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.881 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:23.986 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.098 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.210 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.319 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.428 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.534 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.642 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.751 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.867 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:24.971 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:32.840 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:23:52.797 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:24:12.753 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:24:32.757 - Completed 100.0%. Number of tests started=119 (+0)
DLT 03:24:40.179 - Thread completed. Suite=0 thread=3
DLT stderr JVMDUMP039I Processing dump event "user", detail "" at 2021/10/17 03:23:16 - please wait.

pshipton avatar Oct 18 '21 12:10 pshipton

As described in issue #13726, in getCandidate, if the FlushCandidate has a known allocation candidate associated with it, that can be retrieved using FlushCandidate::getCandidate. However, if the FlushCandidate does not have an allocation candidate, getCandidate will iterate through all the allocation candidates.

The Java method that's being compiled - TestDecimalData2.testConvertBigDecimalNormals - has several hundred candidates for stack allocation, several hundred flush candidates, and well over 100000 nodes. For each node examined in TR_LocalFlushElimination::examineNode, it's iterating over the flush candidates, calling getCandidate, and for the several hundred flush candidates that have no associated candidate for stack allocation, it getCandidate is iterating over all the stack allocation candidates every time it is called. The compilation appears to be hanging, but in fact it's just taking a very long time, and can eventually complete.

Caching the fact that a FlushCandidate has no associated candidate for stack allocation appears to resolve the problem. Just testing out that change more thoroughly.

hzongaro avatar Nov 09 '21 00:11 hzongaro

Fixed in pull request #13885

hzongaro avatar Nov 09 '21 22:11 hzongaro

Observed at JDK17 0.29.1 build job/Test_openjdk17_j9_special.system_aarch64_linux_testList_0/85/ (cent7-aarch64-4)

openjdk version "17.0.1" 2021-10-19
IBM Semeru Runtime Open Edition 17.0.1.0 (build 17.0.1+12)
Eclipse OpenJ9 VM 17.0.1.0 (build openj9-0.29.1, JRE 17 Linux aarch64-64-Bit Compressed References 20211207_54 (JIT enabled, AOT enabled)
OpenJ9   - 7d055dfcb
OMR      - e30892e2b
JCL      - fc67fbe50a0 based on jdk-17.0.1+12)

[2021-12-07T23:54:13.038Z] variation: Mode610-OSRG
[2021-12-07T23:54:13.038Z] JVM_OPTIONS:  -Xcompressedrefs -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation -Xgcpolicy:gencon 

[2021-12-08T00:29:20.642Z] STF 19:29:15.474 - Heartbeat: Process DLT is still running
[2021-12-08T00:29:53.414Z] DLT 19:29:51.638 - Completed 100.0%. Number of tests started=6
[2021-12-08T00:29:53.414Z] DLT 19:29:51.638 - **POSSIBLE HANG DETECTED**
[2021-12-08T00:29:53.414Z] DLT 19:29:52.009 - Thread completed. Suite=0 thread=2
[2021-12-08T00:29:53.414Z] DLT 19:29:52.009 - Thread completed. Suite=0 thread=4
[2021-12-08T00:29:53.414Z] DLT 19:29:52.009 - Thread completed. Suite=0 thread=5
[2021-12-08T00:29:53.414Z] STF 19:29:52.043 - **FAILED** Process DLT has hung
[2021-12-08T00:29:53.414Z] STF 19:29:52.044 - Collecting dumps for: DLT

[2021-12-08T00:31:54.507Z] DaaLoadTest_all_special_5m_26_FAILED

The fix https://github.com/eclipse-openj9/openj9/pull/13885 is not in 0.29.1.

fyi @pshipton @0xdaryl

JasonFengJ9 avatar Dec 08 '21 13:12 JasonFengJ9

The fix is in 0.30

pshipton avatar Dec 08 '21 13:12 pshipton

The problem still occurs. The failure is in a 0.30 build.

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_4/16 - aix71-p8-10 DaaLoadTest_daa1_special_5m_20 -Xcompressedrefs -Xgcpolicy:gencon -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xmn512k -Xcheck:gc:vmthreads:all:quiet

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_special.system_ppc64_aix_Release_testList_4/16/system_test_output.tar.gz

DLT 07:19:16.609 - Completed 40.5%. Number of tests started=170 (+57)
DLT 07:19:36.616 - Completed 47.3%. Number of tests started=205 (+35)
DLT 07:19:56.552 - Completed 54.0%. Number of tests started=208 (+3)
DLT 07:20:16.600 - Completed 60.7%. Number of tests started=208 (+0)
DLT 07:20:36.596 - Completed 67.5%. Number of tests started=209 (+1)
DLT 07:20:56.579 - Completed 74.2%. Number of tests started=209 (+0)
STF 07:21:53.421 - Heartbeat: Process DLT is still running
STF 07:26:53.273 - Heartbeat: Process DLT is still running
STF 07:31:53.235 - Heartbeat: Process DLT is still running
STF 07:36:53.145 - Heartbeat: Process DLT is still running
STF 07:36:53.145 - Heartbeat: Process DLT is still running
DLT 07:37:34.978 - Completed 100.0%. Number of tests started=216 (+7)
DLT 07:37:34.979 - **POSSIBLE HANG DETECTED**
STF 07:37:35.050 - **FAILED** Process DLT has hung
STF 07:37:35.050 - Collecting dumps for: DLT

pshipton avatar Dec 20 '21 15:12 pshipton

@0xdaryl are you ok with moving this out of 0.30 given the issue still occurs post fix?

DanHeidinga avatar Jan 11 '22 15:01 DanHeidinga

I'm just rerunning in grinder to understand whether it's still hanging in the same place or elsewhere now.

hzongaro avatar Jan 11 '22 18:01 hzongaro

I haven't reproduced the suspected hang, but I'm guessing that it likely involves compilation of net/openj9/test/decimals/TestDecimalData2.testConvertBigDecimalNormals()V again. It's a very large method that grows extremely bloated with inlining lots of BigDecimal operations.

Here's some verbose log output. Note the compilation time of 132980487us.

18:13:29  DLT stderr + (warm) net/openj9/test/decimals/TestDecimalData2.testConvertBigDecimalNormals()V @ 0x00000100117D21FC-0x00000100118DDE9C OrdinaryMethod - Q_SZ=6 Q_SZI=1 QW=60 j9m=00000000306217C8 bcsz=11038 sync GCR time=132980487us mem=[region=1155968 system=1163264]KB compThreadID=2 CpuLoad=99%(6%avg) JvmCpu=31%

I'll try to see whether the problem still lies in Escape Analysis or elsewhere.

hzongaro avatar Jan 12 '22 00:01 hzongaro

@DanHeidinga @pshipton : The original problem reported by this issue was investigated and a fix for that issue produced in #13885. It seems this is a second problem related to that pathologically large method. More than 2 minutes for a compilation is already highly abnormal, whether it causes the process to appear to hang or not, and this needs further study. We are not sure which phase(s) of the JIT is causing this and it needs further study.

Henry's opinion is that the test is contrived and is a situation unlikely to appear in the wild. I don't think this should hold up 0.30.

0xdaryl avatar Jan 12 '22 19:01 0xdaryl

Thanks, moving to 0.32.

pshipton avatar Jan 12 '22 19:01 pshipton

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0/2 - win2012-x86-6 DaaLoadTest_daa1_special_5m_24 - -Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0/2/system_test_output.tar.gz

DLT 12:42:44.321 - Completed 100.0%. Number of tests started=158 (+0)
DLT 12:42:55.024 - **POSSIBLE HANG DETECTED**
STF 12:42:55.369 - **FAILED** Process DLT has hung
STF 12:42:55.369 - Collecting dumps for: DLT

pshipton avatar Mar 28 '22 13:03 pshipton

job/Test_openjdk17_j9_sanity.system_aarch64_mac_testList_0/30/(macaarch64rt2 )

java version "17.0.3-ea" 2022-04-19
IBM Semeru Runtime Certified Edition 17.0.3.0-rc1 (build 17.0.3-ea+7)
Eclipse OpenJ9 VM 17.0.3.0-rc1 (build openj9-0.32.0-rc1, JRE 17 Mac OS X aarch64-64-Bit 20220420_20 (JIT enabled, AOT enabled)
OpenJ9   - 9a84ec34e
OMR      - ab24b6666
JCL      - 0e919023eb9 based on jdk-17.0.3+7)

[2022-04-21T05:39:29.555Z] variation: Mode110
[2022-04-21T05:39:29.555Z] JVM_OPTIONS:  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs 

[2022-04-21T06:07:14.792Z] DLT 02:06:36.700 - Load test completed
[2022-04-21T06:07:14.792Z] DLT 02:06:36.701 -   Ran     : 8790
[2022-04-21T06:07:14.792Z] DLT 02:06:36.701 -   Passed  : 8790
[2022-04-21T06:07:14.792Z] DLT 02:06:36.701 -   Failed  : 0
[2022-04-21T06:07:14.792Z] DLT 02:06:36.701 -   Result  : PASSED
[2022-04-21T06:07:14.792Z] STF 02:06:36.793 - **FAILED** Process DLT has hung
[2022-04-21T06:07:14.792Z] STF 02:06:36.793 - Collecting dumps for: DLT


[2022-04-21T06:07:14.792Z] DaaLoadTest_daa1_5m_0_FAILED

JasonFengJ9 avatar Apr 22 '22 18:04 JasonFengJ9

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0/10 DaaLoadTest_daa1_special_5m_24

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0/10/system_test_output.tar.gz

STF 12:55:03.380 - Heartbeat: Process DLT is still running
DLT 12:55:16.940 - Completed 100.0%. Number of tests started=114 (+0)
DLT 12:55:36.956 - Completed 100.0%. Number of tests started=114 (+0)
DLT 12:55:56.878 - Completed 100.0%. Number of tests started=114 (+0)
DLT 12:56:16.894 - Completed 100.0%. Number of tests started=114 (+0)
DLT 12:56:36.941 - Completed 100.0%. Number of tests started=114 (+0)
DLT 12:56:56.987 - Completed 100.0%. Number of tests started=114 (+0)
DLT 12:57:16.894 - Completed 100.0%. Number of tests started=114 (+0)
DLT 12:57:29.284 - **POSSIBLE HANG DETECTED**
STF 12:57:29.319 - **FAILED** Process DLT has hung

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_special.system_x86-64_windows_Personal_testList_2/87 DaaLoadTest_daa1_special_5m_24

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_special.system_x86-64_windows_Personal_testList_2/87/system_test_output.tar.gz

pshipton avatar May 16 '22 13:05 pshipton

An internal build(macaarch64rt5)

java version "11.0.16-ea" 2022-07-19
IBM Semeru Runtime Certified Edition 11.0.16.0-rc1 (build 11.0.16-ea+8)
Eclipse OpenJ9 VM 11.0.16.0-rc1 (build openj9-0.33.0-rc1, JRE 11 Mac OS X aarch64-64-Bit 20220719_68 (JIT enabled, AOT enabled)
OpenJ9   - 8cb527940
OMR      - b58aa2708
JCL      - 8b2d4875e9 based on jdk-11.0.16+8)

[2022-07-19T21:44:19.347Z] variation: -Xgcpolicy:gencon -Xgc:concurrentScavenge
[2022-07-19T21:44:19.347Z] JVM_OPTIONS:  -Xgcpolicy:gencon -Xgc:concurrentScavenge 

[2022-07-19T22:03:33.898Z] DLT 18:02:56.298 - Completed 100.0%. Number of tests started=5676 (+900)
[2022-07-19T22:03:33.898Z] DLT 18:02:56.298 - **POSSIBLE HANG DETECTED**
[2022-07-19T22:03:33.898Z] DLT 18:02:56.407 - Completed 100.0%. Number of tests started=5676 (+0)

[2022-07-19T22:03:33.899Z] DLT 18:02:56.734 - Load test completed
[2022-07-19T22:03:33.899Z] DLT 18:02:56.734 -   Ran     : 5676
[2022-07-19T22:03:33.899Z] DLT 18:02:56.734 -   Passed  : 5676
[2022-07-19T22:03:33.899Z] DLT 18:02:56.734 -   Failed  : 0
[2022-07-19T22:03:33.899Z] DLT 18:02:56.734 -   Result  : PASSED
[2022-07-19T22:03:33.899Z] STF 18:02:56.814 - **FAILED** Process DLT has hung
[2022-07-19T22:03:33.899Z] STF 18:02:56.814 - Collecting dumps for: DLT
[2022-07-19T22:03:33.899Z] STF 18:02:56.814 - Sending SIG 3 to the java process to generate a javacore
[2022-07-19T22:03:33.899Z] STF 18:02:56.815 - Running command: kill -3 32721
[2022-07-19T22:03:33.899Z] STF 18:02:56.815 - Redirecting stderr to /Users/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_mac_testList_2/aqa-tests/TKG/output_16582667523018/DaaLoadTest_daa1_CS_5m_0/20220719-174419-DaaLoadTest/results/1.DLT.kill_3.stderr
[2022-07-19T22:03:33.899Z] STF 18:02:56.815 - Redirecting stdout to /Users/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_mac_testList_2/aqa-tests/TKG/output_16582667523018/DaaLoadTest_daa1_CS_5m_0/20220719-174419-DaaLoadTest/results/1.DLT.kill_3.stdout
[2022-07-19T22:03:33.899Z] STF 18:02:56.819 - Pausing for 30 seconds
[2022-07-19T22:03:33.899Z] STF 18:03:26.842 - Process DLT (pid 32721) is no longer running. Abandoning dump collection.
[2022-07-19T22:03:33.899Z] STF 18:03:26.842 - Monitoring Report Summary:
[2022-07-19T22:03:33.899Z] STF 18:03:26.843 -   o Process DLT has hung
[2022-07-19T22:03:33.899Z] STF 18:03:26.844 - Killing processes: DLT
[2022-07-19T22:03:33.899Z] STF 18:03:26.844 -   o Process DLT pid 32721 is not running
[2022-07-19T22:03:33.899Z] **FAILED** at step 1 (Run daa load test). Expected return value=0 Actual=1 at /Users/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_mac_testList_2/aqa-tests/TKG/../TKG/output_16582667523018/DaaLoadTest_daa1_CS_5m_0/20220719-174419-DaaLoadTest/execute.pl line 95.
[2022-07-19T22:03:33.899Z] STF 18:03:26.915 - **FAILED** execute script failed. Expected return value=0 Actual=1
[2022-07-19T22:03:33.899Z] STF 18:03:26.916 - 
[2022-07-19T22:03:33.899Z] STF 18:03:26.916 - ====================   T E A R D O W N   ====================
[2022-07-19T22:03:33.899Z] STF 18:03:26.916 - Running teardown: perl /Users/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_mac_testList_2/aqa-tests/TKG/../TKG/output_16582667523018/DaaLoadTest_daa1_CS_5m_0/20220719-174419-DaaLoadTest/tearDown.pl
[2022-07-19T22:03:33.899Z] STF 18:03:26.991 - TEARDOWN stage completed
[2022-07-19T22:03:33.899Z] STF 18:03:26.998 - 
[2022-07-19T22:03:33.899Z] STF 18:03:26.998 - =====================   R E S U L T S   =====================
[2022-07-19T22:03:33.899Z] STF 18:03:26.998 - Stage results:
[2022-07-19T22:03:33.899Z] STF 18:03:26.998 -   setUp:     pass
[2022-07-19T22:03:33.899Z] STF 18:03:26.998 -   execute:  *fail*
[2022-07-19T22:03:33.899Z] STF 18:03:26.998 -   teardown:  pass
[2022-07-19T22:03:33.899Z] STF 18:03:26.998 - 
[2022-07-19T22:03:33.899Z] STF 18:03:26.998 - Overall result: **FAILED**
[2022-07-19T22:03:33.899Z] 
[2022-07-19T22:03:33.899Z] DaaLoadTest_daa1_CS_5m_0_FAILED

JasonFengJ9 avatar Jul 20 '22 21:07 JasonFengJ9

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0/23 DaaLoadTest_daa1_special_5m_24

pshipton avatar Aug 29 '22 14:08 pshipton

An internal build(macaarch64rt5)

openjdk version "17.0.5-ea" 2022-10-18
IBM Semeru Runtime Open Edition 17.0.5.0-m1 (build 17.0.5-ea+5)
Eclipse OpenJ9 VM 17.0.5.0-m1 (build v0.35.0-release-1de4f14ba, JRE 17 Mac OS X aarch64-64-Bit 20221017_140 (JIT enabled, AOT enabled)
OpenJ9   - 1de4f14ba
OMR      - 938f0686f
JCL      - 37e17cdb684 based on jdk-17.0.5+5)

[2022-09-08T11:14:32.135Z] variation: Mode145
[2022-09-08T11:14:32.135Z] JVM_OPTIONS:  -Xgcpolicy:gencon -Xshareclasses -Xjit -Xnocompressedrefs 
[2022-09-08T11:14:32.135Z] { itercnt=1; \

[2022-09-08T11:14:32.136Z] DLT 07:13:58.371 - Load test completed
[2022-09-08T11:14:32.136Z] DLT 07:13:58.371 -   Ran     : 147
[2022-09-08T11:14:32.136Z] DLT 07:13:58.371 -   Passed  : 147
[2022-09-08T11:14:32.136Z] DLT 07:13:58.372 -   Failed  : 0
[2022-09-08T11:14:32.136Z] DLT 07:13:58.372 -   Result  : PASSED
[2022-09-08T11:14:32.136Z] STF 07:13:58.553 - **FAILED** Process DLT has hung

[2022-09-08T11:14:32.137Z] DaaLoadTest_daa1_special_5m_8_FAILED

JasonFengJ9 avatar Sep 09 '22 18:09 JasonFengJ9

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_special.system_x86-64_windows_Release_testList_1/29 DaaLoadTest_all_special_5m_24 -Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk8_j9_special.system_x86-64_windows_Release_testList_1/29/system_test_output.tar.gz

DLT 15:02:28.471 - **POSSIBLE HANG DETECTED**
DLT 15:03:08.471 - Completed 100.0%. Number of tests started=80 (+0)
DLT 15:03:08.471 - **POSSIBLE HANG DETECTED**
DLT 15:03:08.580 - Completed 100.0%. Number of tests started=80 (+0)
DLT 15:03:13.502 - **POSSIBLE HANG DETECTED**
DLT 15:03:18.533 - Completed 100.0%. Number of tests started=80 (+0)
DLT 15:03:18.533 - **POSSIBLE HANG DETECTED**
DLT 15:03:23.564 - **POSSIBLE HANG DETECTED**
DLT 15:03:28.596 - **POSSIBLE HANG DETECTED**
DLT 15:03:33.611 - **POSSIBLE HANG DETECTED**
DLT 15:03:38.533 - Completed 100.0%. Number of tests started=80 (+0)
DLT 15:03:38.642 - **POSSIBLE HANG DETECTED**
DLT 15:03:43.689 - **POSSIBLE HANG DETECTED**
DLT 15:03:48.752 - **POSSIBLE HANG DETECTED**
DLT 15:03:53.783 - **POSSIBLE HANG DETECTED**
STF 15:03:58.439 - Monitoring Report Summary:
STF 15:03:58.439 -   o Process DLT has hung
STF 15:03:58.440 - Killing processes: DLT

pshipton avatar Sep 12 '22 14:09 pshipton

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0/31 DaaLoadTest_daa1_special_5m_24 -Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0/31/system_test_output.tar.gz

DLT 17:43:02.014 - Completed 100.0%. Number of tests started=52 (+0)
DLT 17:43:14.045 - **POSSIBLE HANG DETECTED**
STF 17:43:14.486 - **FAILED** Process DLT has hung

DDR reports the load-* threads are doing the following in multiple core files, although I don't think the stack trace is correct. j9trc JVM_OnUnload just returns JNI_OK.

    bp: 0x000000c9b88fe588 pc: 0x00007ff9462504ea C:\Windows\System32\ntdll.dll::NtQueryPerformanceCounter+0xa
    bp: 0x000000c9b88fe590 pc: 0x00007ff94621faf4 C:\Windows\System32\ntdll.dll::RtlQueryPerformanceCounter+0x54
    bp: 0x000000c9b88fe5c0 pc: 0x00007ff935e97d9f F:\Users\jenkins\workspace\Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0\openjdkbinary\j2sdk-image\jre\bin\default\j9prt29.dll::j9port_isCompatible+0x2525f
    bp: 0x000000c9b88fe5f0 pc: 0x00007ff93387f4a0 F:\Users\jenkins\workspace\Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0\openjdkbinary\j2sdk-image\jre\bin\default\j9trc29.dll::JVM_OnUnload+0x6900
    bp: 0x000000c9b88fe6f0 pc: 0x00007ff9338800ef F:\Users\jenkins\workspace\Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0\openjdkbinary\j2sdk-image\jre\bin\default\j9trc29.dll::JVM_OnUnload+0x754f
    bp: 0x000000c9b88fe7e0 pc: 0x00007ff93387bfc9 F:\Users\jenkins\workspace\Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0\openjdkbinary\j2sdk-image\jre\bin\default\j9trc29.dll::JVM_OnUnload+0x3429
    bp: 0x000000c9b88fe820 pc: 0x00007ff933879f81 F:\Users\jenkins\workspace\Test_openjdk8_j9_special.system_x86-64_windows_Personal_testList_0\openjdkbinary\j2sdk-image\jre\bin\default\j9trc29.dll::JVM_OnUnload+0x13e1

infothread.txt

Actual stack from windbg with debug symbols:

ntdll!ZwQueryPerformanceCounter+0xa
ntdll!RtlQueryPerformanceCounter+0x54
j9prt29!omrtime_hires_clock+0xf
j9trc29!traceV+0x160
j9trc29!logTracePoint+0x6f
j9trc29!doTracePoint+0x109
j9trc29!javaTrace+0x61
j9jit29!initializeOSRFrame+0xfb
j9jit29!initializeOSRBuffer+0x26d
j9jit29!induceOSROnCurrentThread+0x398
j9jit29!old_slow_jitInduceOSRAtCurrentPC+0x89
j9jit29!jitInduceOSRAtCurrentPC+0xcc
0x19
0xc9`0000000b
0x8078ece0
j9jit29!jitReferenceArrayCopy+0x8e

pshipton avatar Oct 24 '22 13:10 pshipton

@keithc-ca @0xdaryl any thoughts about the stack in the previous comment? Not sure why all the test load-* threads would be stuck in this state.

pshipton avatar Oct 24 '22 14:10 pshipton

My first guess would be that those threads have encountered an infinite loop in initializeOSRBuffer().

keithc-ca avatar Oct 24 '22 14:10 keithc-ca

Attaching snap file created from a core. snap.txt

Last 100 lines

22:44:19.460117152  0x47B200 j9codertvm(j9jit).61 Exit       <decompileMethodFrameIterator - finish: previous is JIT - copy preserved register values
22:44:19.460137824  0x47B200 j9vm.683             Exit       <WalkStackFrames - walkThread=0x47b200, rc=0
22:44:19.460160240  0x47B200 j9codertvm(j9jit).71 Entry      >performDecompile
22:44:19.460178064  0x47B200 j9codertvm(j9jit).75 Event       building J2I frame at 0x489a08
22:44:19.460199024  0x47B200 j9codertvm(j9jit).114 Exit       <performDecompile - new SP = 0x4899f8, literals = 0x1c69f8, pc = 0xc9b2234158
22:44:19.460225824  0x47B200 j9vm.682             Entry      >WalkStackFrames - walkThread=0x47b200 flags=0x24100000 sp=0x489878 a0=0x489898 pc=0x5 literals=0x0 els=0xc9b8efeb40 j2i=0x489a88
22:44:19.460248896  0x47B200 j9vm.683             Exit       <WalkStackFrames - walkThread=0x47b200, rc=0
22:44:19.460274176  0x47B200 j9codertvm(j9jit).59 Event       pending count from stack mapper is 0
22:44:19.460299440  0x47B200 j9codertvm(j9jit).33 Entry      >jitInterpreterPCFromWalkState - Lookup interp PC for 0x7ff91f02e25d
22:44:19.460318128  0x47B200 j9codertvm(j9jit).37 Exit       <jitInterpreterPCFromWalkState - interp pc = 0xc9b2234158 (stack map)
peter:Downloads$ tail -n 100 snap.txt 
22:44:19.459371568 *0x47B200 j9codertvm(j9jit).58 Event       outgoing arg count is 2
22:44:19.459371568 *0x477500 j9codertvm(j9jit).33 Entry      >jitInterpreterPCFromWalkState - Lookup interp PC for 0x7ff91f02e25d
22:44:19.459377728 *0x46C000 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459382608 *0x473900 j9codertvm(j9jit).59 Event       pending count from stack mapper is 0
22:44:19.459392288 *0x47B200 j9codertvm(j9jit).63 Exit       <decompileMethodFrameIterator
22:44:19.459393360 *0x477500 j9codertvm(j9jit).37 Exit       <jitInterpreterPCFromWalkState - interp pc = 0xc9b2234158 (stack map)
22:44:19.459396816 *0x46C000 j9codertvm(j9jit).58 Event       outgoing arg count is 2
22:44:19.459397536 *0x473900 j9codertvm(j9jit).33 Entry      >jitInterpreterPCFromWalkState - Lookup interp PC for 0x7ff91f02e25d
22:44:19.459410448  0x473900 j9codertvm(j9jit).37 Exit       <jitInterpreterPCFromWalkState - interp pc = 0xc9b2234158 (stack map)
22:44:19.459415632 *0x477500 j9codertvm(j9jit).51 Event       at current PC
22:44:19.459419296 *0x46C000 j9codertvm(j9jit).63 Exit       <decompileMethodFrameIterator
22:44:19.459419296 *0x47B200 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459424608 *0x473900 j9codertvm(j9jit).51 Event       at current PC
22:44:19.459437072 *0x47B200 j9codertvm(j9jit).61 Exit       <decompileMethodFrameIterator - finish: previous is JIT - copy preserved register values
22:44:19.459438272 *0x477500 j9vm.682             Entry      >WalkStackFrames - walkThread=0x477500 flags=0x66240000 sp=0x491258 a0=0x491278 pc=0x5 literals=0x0 els=0xc9b8cfec40 j2i=0x0
22:44:19.459445456 *0x473900 j9vm.682             Entry      >WalkStackFrames - walkThread=0x473900 flags=0x66240000 sp=0x4b5e88 a0=0x4b5ea8 pc=0x5 literals=0x0 els=0xc9b8bfed80 j2i=0x4b6458
22:44:19.459457840 *0x47B200 j9vm.683             Exit       <WalkStackFrames - walkThread=0x47b200, rc=0
22:44:19.459459104 *0x477500 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459471472 *0x473900 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459480416 *0x477500 j9codertvm(j9jit).58 Event       outgoing arg count is 2
22:44:19.459480416 *0x47B200 j9codertvm(j9jit).71 Entry      >performDecompile
22:44:19.459484640 *0x473900 j9codertvm(j9jit).58 Event       outgoing arg count is 2
22:44:19.459498400  0x473900 j9codertvm(j9jit).63 Exit       <decompileMethodFrameIterator
22:44:19.459501104 *0x47B200 j9codertvm(j9jit).75 Event       building J2I frame at 0x489a08
22:44:19.459502176 *0x477500 j9codertvm(j9jit).63 Exit       <decompileMethodFrameIterator
22:44:19.459511824 *0x473900 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459527184 *0x477500 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459527184 *0x47B200 j9codertvm(j9jit).114 Exit       <performDecompile - new SP = 0x4899f8, literals = 0x1c69f8, pc = 0xc9b2234158
22:44:19.459527376 *0x473900 j9codertvm(j9jit).61 Exit       <decompileMethodFrameIterator - finish: previous is JIT - copy preserved register values
22:44:19.459547008 *0x477500 j9codertvm(j9jit).61 Exit       <decompileMethodFrameIterator - finish: previous is JIT - copy preserved register values
22:44:19.459548016 *0x47B200 j9vm.682             Entry      >WalkStackFrames - walkThread=0x47b200 flags=0x24100000 sp=0x489a18 a0=0x489a38 pc=0x5 literals=0x0 els=0xc9b8efeb40 j2i=0x489fd8
22:44:19.459551616 *0x473900 j9vm.683             Exit       <WalkStackFrames - walkThread=0x473900, rc=0
22:44:19.459568048 *0x477500 j9vm.683             Exit       <WalkStackFrames - walkThread=0x477500, rc=0
22:44:19.459571984 *0x47B200 j9vm.683             Exit       <WalkStackFrames - walkThread=0x47b200, rc=0
22:44:19.459577424 *0x473900 j9codertvm(j9jit).71 Entry      >performDecompile
22:44:19.459589200 *0x477500 j9codertvm(j9jit).71 Entry      >performDecompile
22:44:19.459590896 *0x473900 j9codertvm(j9jit).75 Event       building J2I frame at 0x4b5e78
22:44:19.459594720 *0x47B200 j9codertvm(j9jit).59 Event       pending count from stack mapper is 0
22:44:19.459604224 *0x473900 j9codertvm(j9jit).114 Exit       <performDecompile - new SP = 0x4b5e68, literals = 0x1c69f8, pc = 0xc9b2234158
22:44:19.459610256 *0x477500 j9codertvm(j9jit).75 Event       building J2I frame at 0x491248
22:44:19.459616816 *0x47B200 j9codertvm(j9jit).33 Entry      >jitInterpreterPCFromWalkState - Lookup interp PC for 0x7ff91f02e25d
22:44:19.459630096 *0x473900 j9vm.682             Entry      >WalkStackFrames - walkThread=0x473900 flags=0x24100000 sp=0x4b5e88 a0=0x4b5ea8 pc=0x5 literals=0x0 els=0xc9b8bfed80 j2i=0x4b6458
22:44:19.459631184 *0x477500 j9codertvm(j9jit).114 Exit       <performDecompile - new SP = 0x491238, literals = 0x1c69f8, pc = 0xc9b2234158
22:44:19.459637232 *0x47B200 j9codertvm(j9jit).37 Exit       <jitInterpreterPCFromWalkState - interp pc = 0xc9b2234158 (stack map)
22:44:19.459653776 *0x473900 j9vm.683             Exit       <WalkStackFrames - walkThread=0x473900, rc=0
22:44:19.459655664 *0x477500 j9vm.682             Entry      >WalkStackFrames - walkThread=0x477500 flags=0x24100000 sp=0x4910b8 a0=0x4910d8 pc=0x5 literals=0x0 els=0xc9b8cfec40 j2i=0x4912c8
22:44:19.459656592 *0x47B200 j9codertvm(j9jit).51 Event       at current PC
22:44:19.459668240 *0x473900 j9codertvm(j9jit).59 Event       pending count from stack mapper is 0
22:44:19.459678096 *0x47B200 j9vm.682             Entry      >WalkStackFrames - walkThread=0x47b200 flags=0x66240000 sp=0x489a18 a0=0x489a38 pc=0x5 literals=0x0 els=0xc9b8efeb40 j2i=0x489fd8
22:44:19.459678096 *0x477500 j9vm.683             Exit       <WalkStackFrames - walkThread=0x477500, rc=0
22:44:19.459681936 *0x473900 j9codertvm(j9jit).33 Entry      >jitInterpreterPCFromWalkState - Lookup interp PC for 0x7ff91f02e25d
22:44:19.459695360  0x473900 j9codertvm(j9jit).37 Exit       <jitInterpreterPCFromWalkState - interp pc = 0xc9b2234158 (stack map)
22:44:19.459700128 *0x47B200 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459701200 *0x477500 j9codertvm(j9jit).59 Event       pending count from stack mapper is 0
22:44:19.459719984 *0x473900 j9codertvm(j9jit).51 Event       at current PC
22:44:19.459722560 *0x477500 j9codertvm(j9jit).33 Entry      >jitInterpreterPCFromWalkState - Lookup interp PC for 0x7ff91f02e25d
22:44:19.459722560 *0x47B200 j9codertvm(j9jit).58 Event       outgoing arg count is 2
22:44:19.459744224  0x47B200 j9codertvm(j9jit).63 Exit       <decompileMethodFrameIterator
22:44:19.459744224 *0x473900 j9vm.682             Entry      >WalkStackFrames - walkThread=0x473900 flags=0x66240000 sp=0x4b5e88 a0=0x4b5ea8 pc=0x5 literals=0x0 els=0xc9b8bfed80 j2i=0x4b6458
22:44:19.459744848 *0x477500 j9codertvm(j9jit).37 Exit       <jitInterpreterPCFromWalkState - interp pc = 0xc9b2234158 (stack map)
22:44:19.459765616 *0x47B200 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459766672 *0x477500 j9codertvm(j9jit).51 Event       at current PC
22:44:19.459787680  0x477500 j9vm.682             Entry      >WalkStackFrames - walkThread=0x477500 flags=0x66240000 sp=0x4910b8 a0=0x4910d8 pc=0x5 literals=0x0 els=0xc9b8cfec40 j2i=0x4912c8
22:44:19.459787680 *0x47B200 j9codertvm(j9jit).61 Exit       <decompileMethodFrameIterator - finish: previous is JIT - copy preserved register values
22:44:19.459808080  0x47B200 j9vm.683             Exit       <WalkStackFrames - walkThread=0x47b200, rc=0
22:44:19.459809152 *0x477500 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459829536 *0x47B200 j9codertvm(j9jit).71 Entry      >performDecompile
22:44:19.459830608 *0x477500 j9codertvm(j9jit).58 Event       outgoing arg count is 2
22:44:19.459854272  0x477500 j9codertvm(j9jit).63 Exit       <decompileMethodFrameIterator
22:44:19.459854272 *0x47B200 j9codertvm(j9jit).75 Event       building J2I frame at 0x489a08
22:44:19.459874496  0x47B200 j9codertvm(j9jit).114 Exit       <performDecompile - new SP = 0x4899f8, literals = 0x1c69f8, pc = 0xc9b2234158
22:44:19.459874496 *0x477500 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.459895296  0x477500 j9codertvm(j9jit).61 Exit       <decompileMethodFrameIterator - finish: previous is JIT - copy preserved register values
22:44:19.459896640 *0x47B200 j9vm.682             Entry      >WalkStackFrames - walkThread=0x47b200 flags=0x24100000 sp=0x489a18 a0=0x489a38 pc=0x5 literals=0x0 els=0xc9b8efeb40 j2i=0x489fd8
22:44:19.459916320 *0x477500 j9vm.683             Exit       <WalkStackFrames - walkThread=0x477500, rc=0
22:44:19.459920384 *0x47B200 j9vm.683             Exit       <WalkStackFrames - walkThread=0x47b200, rc=0
22:44:19.459937504 *0x477500 j9codertvm(j9jit).71 Entry      >performDecompile
22:44:19.459942832 *0x47B200 j9codertvm(j9jit).59 Event       pending count from stack mapper is 0
22:44:19.459958032 *0x477500 j9codertvm(j9jit).75 Event       building J2I frame at 0x4910a8
22:44:19.459963920 *0x47B200 j9codertvm(j9jit).33 Entry      >jitInterpreterPCFromWalkState - Lookup interp PC for 0x7ff91f02e25d
22:44:19.459978880 *0x477500 j9codertvm(j9jit).114 Exit       <performDecompile - new SP = 0x491098, literals = 0x1c69f8, pc = 0xc9b2234158
22:44:19.459984048 *0x47B200 j9codertvm(j9jit).37 Exit       <jitInterpreterPCFromWalkState - interp pc = 0xc9b2234158 (stack map)
22:44:19.460003616  0x47B200 j9codertvm(j9jit).51 Event       at current PC
22:44:19.460003616 *0x477500 j9vm.682             Entry      >WalkStackFrames - walkThread=0x477500 flags=0x24100000 sp=0x491008 a0=0x491028 pc=0x5 literals=0x0 els=0xc9b8cfec40 j2i=0x4912c8
22:44:19.460027936  0x477500 j9vm.683             Exit       <WalkStackFrames - walkThread=0x477500, rc=0
22:44:19.460027936 *0x47B200 j9vm.682             Entry      >WalkStackFrames - walkThread=0x47b200 flags=0x66240000 sp=0x489a18 a0=0x489a38 pc=0x5 literals=0x0 els=0xc9b8efeb40 j2i=0x489fd8
22:44:19.460046896  0x47B200 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.460069168  0x47B200 j9codertvm(j9jit).58 Event       outgoing arg count is 2
22:44:19.460084944  0x47B200 j9codertvm(j9jit).63 Exit       <decompileMethodFrameIterator
22:44:19.460101136  0x47B200 j9codertvm(j9jit).57 Entry      >decompileMethodFrameIterator
22:44:19.460117152  0x47B200 j9codertvm(j9jit).61 Exit       <decompileMethodFrameIterator - finish: previous is JIT - copy preserved register values
22:44:19.460137824  0x47B200 j9vm.683             Exit       <WalkStackFrames - walkThread=0x47b200, rc=0
22:44:19.460160240  0x47B200 j9codertvm(j9jit).71 Entry      >performDecompile
22:44:19.460178064  0x47B200 j9codertvm(j9jit).75 Event       building J2I frame at 0x489a08
22:44:19.460199024  0x47B200 j9codertvm(j9jit).114 Exit       <performDecompile - new SP = 0x4899f8, literals = 0x1c69f8, pc = 0xc9b2234158
22:44:19.460225824  0x47B200 j9vm.682             Entry      >WalkStackFrames - walkThread=0x47b200 flags=0x24100000 sp=0x489878 a0=0x489898 pc=0x5 literals=0x0 els=0xc9b8efeb40 j2i=0x489a88
22:44:19.460248896  0x47B200 j9vm.683             Exit       <WalkStackFrames - walkThread=0x47b200, rc=0
22:44:19.460274176  0x47B200 j9codertvm(j9jit).59 Event       pending count from stack mapper is 0
22:44:19.460299440  0x47B200 j9codertvm(j9jit).33 Entry      >jitInterpreterPCFromWalkState - Lookup interp PC for 0x7ff91f02e25d
22:44:19.460318128  0x47B200 j9codertvm(j9jit).37 Exit       <jitInterpreterPCFromWalkState - interp pc = 0xc9b2234158 (stack map)

pshipton avatar Oct 24 '22 14:10 pshipton

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_special.system_x86-64_windows_Personal_testList_3/111 DaaLoadTest_all_special_5m_24

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_special.system_x86-64_windows_Personal_testList_3/111/system_test_output.tar.gz

DLT 16:52:37.626 - Completed 100.0%. Number of tests started=79 (+0)
DLT 16:52:57.220 - **POSSIBLE HANG DETECTED**

pshipton avatar Nov 07 '22 15:11 pshipton

JDK19 internal build(win16x64rt-1-6)

openjdk version "19.0.1-beta" 2022-10-18
IBM Semeru Runtime Open Edition 19.0.1+10-202211230455 (build 19.0.1-beta+10-202211230455)
Eclipse OpenJ9 VM 19.0.1+10-202211230455 (build master-55c52ff26, JRE 19 Windows Server 2016 amd64-64-Bit Compressed References 20221123_75 (JIT enabled, AOT enabled)
OpenJ9   - 55c52ff26
OMR      - 310f0efc8
JCL      - 720d535776 based on jdk-19.0.1+10)

[2022-11-23T08:18:44.314Z] variation: Mode107-OSRG
[2022-11-23T08:18:44.727Z] JVM_OPTIONS:  -Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation 

[2022-11-23T08:39:39.948Z] DLT 00:39:56.014 - **POSSIBLE HANG DETECTED**
[2022-11-23T08:39:39.948Z] STF 00:39:56.026 - **FAILED** Process DLT has hung

[2022-11-23T08:41:15.712Z] DaaLoadTest_daa1_special_5m_24_FAILED

JasonFengJ9 avatar Nov 23 '22 22:11 JasonFengJ9

JDK11 0.36.0 milestone 1 build(macaarch64rt6)

openjdk version "11.0.18" 2023-01-17
IBM Semeru Runtime Open Edition 11.0.18.0-m1 (build 11.0.18+4)
Eclipse OpenJ9 VM 11.0.18.0-m1 (build v0.36.0-release-f98e55973, JRE 11 Mac OS X aarch64-64-Bit 20221130_200 (JIT enabled, AOT enabled)
OpenJ9   - f98e55973
OMR      - acdbafc4a
JCL      - acdae91f88 based on jdk-11.0.18+4)

[2022-12-01T09:13:57.064Z] variation: Mode107-OSRG
[2022-12-01T09:13:57.064Z] JVM_OPTIONS:  -Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation 

[2022-12-01T09:17:46.007Z] DLT 04:17:34.972 - Completed 67.6%. Number of tests started=3246 (+419)
[2022-12-01T09:35:29.464Z] DLT 04:35:03.340 - Completed 100.0%. Number of tests started=3516 (+270)
[2022-12-01T09:35:29.464Z] DLT 04:35:03.340 - **POSSIBLE HANG DETECTED**
[2022-12-01T09:35:29.464Z] STF 04:35:03.440 - **FAILED** Process DLT has hung
[2022-12-01T09:35:29.464Z] STF 04:35:03.440 - Collecting dumps for: DLT
[2022-12-01T09:35:29.464Z] STF 04:35:03.440 - Sending SIG 3 to the java process to generate a javacore
[2022-12-01T09:35:29.464Z] STF 04:35:03.441 - Running command: kill -3 54781

[2022-12-01T09:35:34.179Z] STF 04:35:33.923 - Overall result: **FAILED**
[2022-12-01T09:35:34.179Z] -----------------------------------
[2022-12-01T09:35:34.179Z] DaaLoadTest_daa3_special_5m_24_FAILED
[2022-12-01T09:40:38.236Z] variation: Mode104
[2022-12-01T09:40:38.236Z] JVM_OPTIONS:  -Xgcpolicy:optthruput -Xjit:count=0,optlevel=hot,gcOnResolve,rtResolve -Xnocompressedrefs 

[2022-12-01T09:43:57.893Z] DLT 04:43:51.488 - Completed 60.3%. Number of tests started=4079 (+321)
[2022-12-01T10:08:25.246Z] STF 05:07:21.311 - Heartbeat: Process DLT is still running
[2022-12-01T10:08:25.246Z] DLT 05:07:21.348 - Completed 100.0%. Number of tests started=4169 (+90)
[2022-12-01T10:08:25.246Z] DLT 05:07:21.351 - **POSSIBLE HANG DETECTED**
[2022-12-01T10:08:25.246Z] DLT 05:07:21.464 - Completed 100.0%. Number of tests started=4169 (+0)
[2022-12-01T10:08:25.246Z] DLT 05:07:21.485 - Thread completed. Suite=0 thread=1
[2022-12-01T10:08:25.246Z] DLT 05:07:21.566 - Completed 100.0%. Number of tests started=4169 (+0)
[2022-12-01T10:08:25.246Z] DLT 05:07:21.671 - Completed 100.0%. Number of tests started=4169 (+0)
[2022-12-01T10:08:25.246Z] DLT 05:07:21.776 - Completed 100.0%. Number of tests started=4169 (+0)
[2022-12-01T10:08:25.246Z] STF 05:07:21.828 - **FAILED** Process DLT has hung
[2022-12-01T10:08:25.246Z] STF 05:07:21.828 - Collecting dumps for: DLT
[2022-12-01T10:08:25.246Z] STF 05:07:21.828 - Sending SIG 3 to the java process to generate a javacore
[2022-12-01T10:08:25.246Z] STF 05:07:21.829 - Running command: kill -3 55436

[2022-12-01T10:08:25.246Z] DaaLoadTest_all_special_5m_2_FAILED

JasonFengJ9 avatar Dec 02 '22 23:12 JasonFengJ9