openj9
openj9 copied to clipboard
aarch64 mac SharedClasses.SCM23.MultiCL_0 abort signal received
https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/97 SharedClasses.SCM23.MultiCL_0
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/97/system_test_output.tar.gz
MCL2 12:37:51 >> Loaded 17000 classes...
STF 12:37:51.370 - Found dump at: /Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/javacore.20220713.123742.24808.0002.txt
MCL5 stderr JVMDUMP010I System dump written to /Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/core.20220713.123742.24808.0001.dmp
MCL5 stderr JVMDUMP032I JVM requested Java dump using '/Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/javacore.20220713.123742.24808.0002.txt' in response to an event
MCL2 stderr javacore file generated - /Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/javacore.20220713.123742.24808.0002.txt
MCL3 12:37:51 >> Loaded 15000 classes...
MCL1 12:37:53 >> Loaded 14000 classes...
MCL2 12:37:54 >> Loaded 18000 classes...
MCL4 12:37:55 >> Loaded 15000 classes...
MCL3 12:37:56 >> Loaded 16000 classes...
MCL2 12:37:59 >> Loaded 19000 classes...
MCL4 12:37:59 >> Loaded 16000 classes...
MCL2 12:38:03 >> Loaded 20000 classes...
MCL2 12:38:03 >> Total classes loaded = 20001
MCL4 12:38:04 >> Loaded 17000 classes...
MCL1 12:38:04 >> Loaded 15000 classes...
MCL4 12:38:07 >> Loaded 18000 classes...
MCL3 12:38:08 >> Loaded 17000 classes...
MCL4 12:38:10 >> Loaded 19000 classes...
MCL1 12:38:11 >> Loaded 16000 classes...
MCL1 12:38:17 >> Loaded 17000 classes...
MCL3 12:38:18 >> Loaded 18000 classes...
MCL4 12:38:21 >> Loaded 20000 classes...
MCL4 12:38:21 >> Total classes loaded = 20001
MCL1 12:38:21 >> Loaded 18000 classes...
MCL3 12:38:24 >> Loaded 19000 classes...
MCL1 12:38:25 >> Loaded 19000 classes...
MCL1 12:38:28 >> Loaded 20000 classes...
MCL1 12:38:28 >> Total classes loaded = 20001
MCL3 12:38:30 >> Loaded 20000 classes...
MCL3 12:38:30 >> Total classes loaded = 20001
STF 12:38:31.117 - Monitoring Report Summary:
STF 12:38:31.117 - o Process MCL1 ended with the expected exit code (0)
STF 12:38:31.117 - o Process MCL2 has crashed unexpectedly
STF 12:38:31.117 - o Process MCL3 ended with the expected exit code (0)
STF 12:38:31.117 - o Process MCL4 ended with the expected exit code (0)
STF 12:38:31.117 - o Process MCL5 ended with the expected exit code (0)
STF 12:38:31.120 - Killing processes: MCL1 MCL2 MCL3 MCL4 MCL5
Seems MCL5 got an abort signal, not sure why, and created some diagnostic files. MCL2 noticed the diagnostic files and ended unexpectedly.
5.MCL5.stderr
JVMDUMP039I Processing dump event "abort", detail "" at 2022/07/13 12:37:42 - please wait.
JVMDUMP032I JVM requested System dump using '/Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/core.20220713.123742.24808.0001.dmp' in response to an event
JVMDUMP010I System dump written to /Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/core.20220713.123742.24808.0001.dmp
JVMDUMP032I JVM requested Java dump using '/Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/javacore.20220713.123742.24808.0002.txt' in response to an event
5.MCL2.stderr
core file generated - /Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/core.20220713.123742.24808.0001.dmp
javacore file generated - /Users/jenkins/workspace/Test_openjdk18_j9_extended.system_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/output_16576782091243/SharedClasses.SCM23.MultiCL_0/20220713-123437-SharedClasses/results/javacore.20220713.123742.24808.0002.txt
Perhaps related to https://github.com/eclipse-openj9/openj9/issues/15352 @knn-k fyi
This may be a result of changes to the test framework to send an abort signal on a hang. @Mesbah-Alam there are no messages about a hang.
This may be a result of changes to the test framework to send an abort signal on a hang. @Mesbah-Alam there are no messages about a hang.
Are you referring to messages like "FAILED Process LT has timed out" (e.g. https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/1079/consoleFull)?
The recent change only adds the -Xdump:system:events=user
to the command line to ensure system core gets generated when STF kills a hanging process with kill -3
. The functionality of STF sending a kill -3
at the event of a hang has not been touched though.
The test in question here is not "hanging" though, due to which STF would send a kill -3
. It's not clear why 20220713-123437-SharedClasses/results/core.20220713.123742.24808.0001.dmp
got generated.
Ah right, we talked about sending the abort signal but then decided on another approach.
Another one: https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_extended.system_aarch64_mac_Nightly_testList_0/119/tapResults/
https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_extended.system_aarch64_mac_Nightly_testList_2/5/ - mac11-aarch64-6 SharedClasses.SCM23.MultiCL_0
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_extended.system_aarch64_mac_Nightly_testList_2/5/system_test_output.tar.gz
MCL4 stderr JVMDUMP042W Abort signal received while running on Java stack. The JVM dump agents could not be run.
6.MCL5.stderr
JVMDUMP039I Processing dump event "abort", detail "" at 2022/09/13 13:07:01 - please wait.
JVMDUMP032I JVM requested System dump using '/Users/jenkins/workspace/Test_openjdk19_j9_extended.system_aarch64_mac_Nightly_testList_2/aqa-tests/TKG/output_16630366747823/SharedClasses.SCM23.MultiCL_0/20220913-130103-SharedClasses/results/core.20220913.130701.63810.0001.dmp' in response to an event
Is the abort signal what happens on Mac when there aren't enough resources? Similar to the OS issuing a kill -9
on Linux?
I have no idea on SIGABRT on macOS. Sending the signal to an OpenJ9 process generated the following output.
% kill -ABRT 69987
JVMDUMP039I Processing dump event "abort", detail "" at 2022/09/14 09:06:22 - please wait.
JVMDUMP032I JVM requested System dump using '/Users/openj9/core.20220914.090622.69987.0001.dmp' in response to an event
JVMDUMP010I System dump written to /Users/openj9/core.20220914.090622.69987.0001.dmp
JVMDUMP032I JVM requested Java dump using '/Users/openj9/javacore.20220914.090622.69987.0002.txt' in response to an event
JVMDUMP010I Java dump written to /Users/openj9/javacore.20220914.090622.69987.0002.txt
JVMDUMP032I JVM requested Snap dump using '/Users/openj9/Snap.20220914.090622.69987.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /Users/openj9/Snap.20220914.090622.69987.0003.trc
JVMDUMP032I JVM requested JIT dump using '/Users/openj9/jitdump.20220914.090622.69987.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'SIGABRT Thread' thread 0x000000011A020300
JVMDUMP010I JIT dump written to /Users/openj9/jitdump.20220914.090622.69987.0004.dmp
JVMDUMP013I Processed dump event "abort", detail "".
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_extended.system_aarch64_mac_Nightly_testList_0/138 - mac11-aarch64-7 SharedClasses.SCM23.MultiCL_0
No diagnostics or messages that I found. https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_extended.system_aarch64_mac_Nightly_testList_0/138/system_test_output.tar.gz
STF 10:45:13.192 - **FAILED** Process MCL2 ended with exit code (255) and not the expected exit code/s (0)
Putting the behavior with abort signals aside, PR #15907 should fix the crashes with SharedClasses.SCM23.* tests.
Another, but before the potential fix was merged. Apparently the abort can mean the processes is attempting to write to memory it doesn't own.
https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_extended.system_aarch64_mac_Nightly_testList_2/10/ SharedClasses.SCM23.MultiCL_0
MCL5 13:19:54 >> Total classes loaded = 20001
MCL5 stderr JVMDUMP039I Processing dump event "abort", detail "" at 2022/09/20 13:19:54 - please wait.