openj9
openj9 copied to clipboard
JDK 18/19 jdk_vector_0_FAILED Unexpected exit from test [exit code: 137]
Failure link
From an internal build job/Test_openjdknext_j9_extended.openjdk_aarch64_linux_Personal/1/consoleFull
(ub18-aarch64-7
):
11:36:29 openjdk version "19-internal" 2022-09-20
11:36:29 OpenJDK Runtime Environment (build 19-internal-adhoc.jenkins.BuildJDKnextaarch64linuxPersonal)
11:36:29 Eclipse OpenJ9 VM (build exclude19-52f04efbff5, JRE 19 Linux aarch64-64-Bit Compressed References 20220607_35 (JIT enabled, AOT enabled)
11:36:29 OpenJ9 - 52f04efbff5
11:36:29 OMR - c60867497c6
11:36:29 JCL - 5ccf02de16a based on jdk-19+25)
Rerun in Grinder - Change TARGET to run only the failed test targets.
Optional info
Failure output (captured from console output)
14:55:40 ===============================================
14:55:40 Running test jdk_vector_0 ...
14:55:40 ===============================================
14:55:40 jdk_vector_0 Start Time: Tue Jun 7 14:55:40 2022 Epoch Time (ms): 1654628140115
14:55:40 "/home/jenkins/workspace/Test_openjdknext_j9_extended.openjdk_aarch64_linux_Personal/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdknext_j9_extended.openjdk_aarch64_linux_Personal/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
14:55:40 JVMSHRC005I No shared class caches available
14:55:40 JVMSHRC005I No shared class caches available
14:55:40 cache cleanup done
14:55:40 variation: Mode150
14:55:40 JVM_OPTIONS: -XX:+UseCompressedOops
15:00:05 TEST: jdk/incubator/vector/Byte128VectorTests.java
15:00:05 TEST: jdk/incubator/vector/Byte256VectorTests.java
15:11:53 TEST: jdk/incubator/vector/Int128VectorTests.java
15:23:13 TEST: jdk/incubator/vector/Short256VectorTests.java
15:23:13 TEST RESULT: Failed. Unexpected exit from test [exit code: 137]
15:23:13 --------------------------------------------------
15:33:12 Test results: passed: 67; failed: 4
15:33:31 jdk_vector_0_FAILED
@gita-omr
Looking.
An internal build(cent7-aarch64-6
)
openjdk version "18.0.1.1" 2022-04-22
IBM Semeru Runtime Open Edition 18.0.2.0-m2 (build 18.0.1.1+2)
Eclipse OpenJ9 VM 18.0.2.0-m2 (build master-8ead00aa6, JRE 18 Linux aarch64-64-Bit Compressed References 20220714_19 (JIT enabled, AOT enabled)
OpenJ9 - 8ead00aa6
OMR - 29756eb80
JCL - c0b6f04919 based on jdk-18.0.1.1+2)
[2022-07-14T23:45:13.370Z] variation: Mode150
[2022-07-14T23:45:13.370Z] JVM_OPTIONS: -XX:+UseCompressedOops
[2022-07-15T00:13:52.837Z] TEST RESULT: Failed. Unexpected exit from test [exit code: 137]
[2022-07-15T00:13:52.837Z] --------------------------------------------------
[2022-07-15T00:22:18.492Z] Test results: passed: 71; failed: 1
[2022-07-15T00:22:30.546Z] Report written to /home/jenkins/workspace/Test_openjdk18_j9_extended.openjdk_aarch64_linux/jvmtest/openjdk/report/html/report.html
[2022-07-15T00:22:30.546Z] Results written to /home/jenkins/workspace/Test_openjdk18_j9_extended.openjdk_aarch64_linux/aqa-tests/TKG/output_16578313613568/jdk_vector_0/work
[2022-07-15T00:22:30.546Z] Error: Some tests failed or other problems occurred.
[2022-07-15T00:22:30.546Z]
[2022-07-15T00:22:30.546Z] jdk_vector_0_FAILED
FYI @Akira1Saitoh
An internal build(ub18-aarch64-8
)
[2022-08-04T22:30:39.700Z] variation: Mode150
[2022-08-04T22:30:39.700Z] JVM_OPTIONS: -XX:+UseCompressedOops
[2022-08-04T22:50:43.793Z] TEST: jdk/incubator/vector/Long128VectorTests.java
[2022-08-04T22:50:43.806Z] TEST RESULT: Failed. Unexpected exit from test [exit code: 137]
[2022-08-04T22:50:43.806Z] --------------------------------------------------
[2022-08-04T22:52:40.943Z] TEST: jdk/incubator/vector/LongMaxVectorTests.java
[2022-08-04T22:52:40.951Z] TEST RESULT: Failed. Unexpected exit from test [exit code: 137]
[2022-08-04T22:52:40.951Z] --------------------------------------------------
[2022-08-04T22:58:11.608Z] TEST: jdk/incubator/vector/ShortMaxVectorTests.java
[2022-08-04T22:58:11.614Z] TEST RESULT: Failed. Unexpected exit from test [exit code: 137]
[2022-08-04T22:58:11.614Z] --------------------------------------------------
[2022-08-04T23:06:28.986Z] Test results: passed: 69; failed: 3
[2022-08-04T23:06:34.769Z] Report written to /home/jenkins/workspace/Test_openjdk18_j9_extended.openjdk_aarch64_linux/jvmtest/openjdk/report/html/report.html
[2022-08-04T23:06:34.769Z] Results written to /home/jenkins/workspace/Test_openjdk18_j9_extended.openjdk_aarch64_linux/aqa-tests/TKG/output_16596423323148/jdk_vector_0/work
[2022-08-04T23:06:34.769Z] Error: Some tests failed or other problems occurred.
[2022-08-04T23:06:34.769Z]
[2022-08-04T23:06:34.769Z] jdk_vector_0_FAILED
@knn-k any update on this?
It seems that Linux kernel OOM killer terminates test java process of jdk_vector_0
. A certain test spends more than 4GB of memory and multiple tests are running in parallel. Hence, it uses up the system memory (8GB in this case), resulting in OOM killer. Below is the excerpt from dmesg on an internal machine (ub18-aarch64).
[3340113.392290] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[3340113.392296] [ 498] 0 498 68302 2111 540672 1553 0 systemd-journal
[3340113.392298] [ 507] 0 507 21532 2 53248 66 0 lvmetad
[3340113.392300] [ 517] 0 517 4212 116 57344 205 -1000 systemd-udevd
[3340113.392302] [ 667] 100 667 2842 29 61440 127 0 systemd-network
[3340113.392303] [ 685] 0 685 1631 22 57344 110 0 rpcbind
[3340113.392305] [ 794] 0 794 96581 106 110592 347 0 udisksd
[3340113.392306] [ 801] 103 801 6205 160 90112 329 -900 dbus-daemon
[3340113.392308] [ 845] 0 845 912 8 49152 45 0 atd
[3340113.392309] [ 851] 0 851 20096 37 53248 61 0 irqbalance
[3340113.392311] [ 858] 0 858 166867 0 126976 249 0 lxcfs
[3340113.392313] [ 864] 0 864 26124 2 98304 2042 0 networkd-dispat
[3340113.392314] [ 866] 0 866 2675 27 57344 46 0 cron
[3340113.392316] [ 875] 0 875 59565 49 98304 230 0 accounts-daemon
[3340113.392318] [ 876] 102 876 59380 133 106496 591 0 rsyslogd
[3340113.392319] [ 889] 0 889 57868 100 81920 146 0 polkitd
[3340113.392321] [ 899] 101 899 2620 21 61440 161 0 systemd-resolve
[3340113.392323] [ 1121] 0 1121 444036 1599 364544 4125 -999 containerd
[3340113.392324] [ 1138] 111 1138 22000 35 86016 329 0 ntpd
[3340113.392326] [ 1143] 0 1143 2605 2 61440 181 -1000 sshd
[3340113.392328] [ 1310] 0 1310 365408 2703 442368 7434 -500 dockerd
[3340113.392329] [ 1518] 0 1518 2335 0 53248 34 0 agetty
[3340113.392331] [ 1532] 0 1532 2391 0 53248 35 0 agetty
[3340113.392333] [ 1002] 0 1002 3090 0 61440 283 0 systemd
[3340113.392334] [ 1003] 0 1003 43283 1 110592 746 0 (sd-pam)
[3340113.392337] [15684] 0 15684 6974 2 94208 475 0 sshd
[3340113.392339] [15751] 1000 15751 8127 51 106496 1581 0 sshd
[3340113.392340] [15806] 1000 15806 1557255 9367 819200 27520 0 java
[3340113.392342] [21187] 1000 21187 321581 0 368640 2606 0 Xvfb
[3340113.392344] [21230] 1000 21230 480 5 40960 20 0 sh
[3340113.392346] [21231] 1000 21231 480 19 40960 10 0 sh
[3340113.392347] [21232] 1000 21232 480 0 36864 26 0 sh
[3340113.392348] [21238] 1000 21238 2311 2 53248 86 0 make
[3340113.392350] [21284] 1000 21284 2340 2 53248 117 0 make
[3340113.392351] [21388] 1000 21388 480 0 36864 25 0 sh
[3340113.392353] [21389] 1000 21389 2339 1 53248 119 0 make
[3340113.392354] [21391] 1000 21391 480 0 45056 25 0 sh
[3340113.392355] [21392] 1000 21392 2342 2 49152 127 0 make
[3340113.392357] [21451] 1000 21451 480 0 45056 27 0 sh
[3340113.392358] [21452] 1000 21452 480 0 45056 30 0 sh
[3340113.392359] [21453] 1000 21453 2211 0 53248 27 0 tee
[3340113.392361] [21455] 1000 21455 1154331 5839 622592 10454 0 java
[3340113.392362] [21714] 1000 21714 777895 0 696320 30295 0 java
[3340113.392364] [21743] 1000 21743 777508 0 679936 29266 0 java
[3340113.392365] [21773] 1000 21773 777766 983 700416 28785 0 java
[3340113.392367] [23404] 1000 23404 1621004 733817 7254016 65847 0 java
[3340113.392369] [23565] 1000 23565 1891469 1000758 9441280 24641 0 java
[3340113.392370] [23705] 1000 23705 964486 158287 2023424 8570 0 java
[3340113.392372] [23769] 1000 23769 2209 17 53248 0 0 sleep
[3340113.392374] [23777] 0 23777 2532 135 57344 0 0 systemd-logind
[3340113.392375] Out of memory: Kill process 23565 (java) score 448 or sacrifice child
[3340113.438997] Killed process 23565 (java) total-vm:7565876kB, anon-rss:4003052kB, file-rss:0kB, shmem-rss:0kB
[3340113.654549] oom_reaper: reaped process 23565 (java), now anon-rss:4kB, file-rss:0kB, shmem-rss:0kB
@gita-omr - Any new news?
@Akira1Saitoh got interesting findings. The tests are actually very small but looks like a lot of them are running in parallel. I am not very familiar with testng
but maybe there are settings to run fewer processes/threads? I think this would be the first thing to try. @JasonFengJ9
The tests are actually very small but looks like a lot of them are running in parallel. I am not very familiar with testng but maybe there are settings to run fewer processes/threads?
@gita-omr are you referring to run individual tests like jdk/incubator/vector/Byte128VectorTests.java
(there is a grinder option to do so) instead of jdk_vector_0
?
@gita-omr are you referring to run individual tests like
jdk/incubator/vector/Byte128VectorTests.java
(there is a grinder option to do so) instead ofjdk_vector_0
?
Maybe we can try that?
Is it possible to use jtreg option to set the concurrency to lower value?
I am seeing the failure on the internal machine with 8 cpu cores and 8 GB memory. The test does not fail on another machine with 8 cpu cores and 16 GB memory.
@gita-omr are you referring to run individual tests like jdk/incubator/vector/Byte128VectorTests.java (there is a grinder option to do so) instead of jdk_vector_0?
Maybe we can try that?
An example internal build from https://github.com/eclipse-openj9/openj9/issues/16244
Is it possible to use jtreg option to set the concurrency to lower value?
@llxia any insights?
Is it possible to use jtreg option to set the concurrency to lower value?
In OpenJDK tests, we set concurrency based on the # following: min(NPROCS/2, MEM_IN_GB/2)
. To change concurrency, change L66 in your branch.
https://github.com/adoptium/aqa-tests/blob/d5e241df7fe9a7648520efe4ebe1defc5fa5fd7d/openjdk/openjdk.mk#L46-L66
Hmm. the concurrency on aarch64 machine (8GB memory) is 3 while it is 1 on x86 machine (ub20x64rt3) with 4GB memory. This might be the reason why the test only fails on aarch64.
on aarch64:
$ cat Makefile
MEMORY_SIZE:=$(shell KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; if [ -r /sys/fs/cgroup/memory/memory.limit_in_bytes ]; then CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes`; else CGMEM=`expr $${KMEMMB} \* 1024`; fi; CGMEMMB=`expr $${CGMEM} / 1048576`; if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; fi)
All:
@echo ${MEMORY_SIZE}
$ make
7973
$ cat /proc/meminfo
MemTotal: 8165204 kB
on x86:
$ make
3931
$ cat /proc/meminfo
MemTotal: 4025600 kB
@llxia Thanks for the pointer. Is it possible to change the concurrency only for jdk_vector test?
@Akira1Saitoh I think it is possible to do it with new parameters:
In https://github.com/adoptium/aqa-tests/blob/d5e241df7fe9a7648520efe4ebe1defc5fa5fd7d/openjdk/openjdk.mk,
EXTRA_JTREG_OPTIONS += -concurrency:$(JTREG_CONC)
...
# set JTREG_BASIC_OPTIONS value into a new parameter before adding EXTRA_JTREG_OPTIONS
JTREG_BASIC_OPTIONS_WO_EXTRA_OPTS = $(JTREG_BASIC_OPTIONS)
JTREG_BASIC_OPTIONS += $(EXTRA_JTREG_OPTIONS)
# add another new parameter for concurrency
SPECIAL_CONCURRENCY=xxxx
In https://github.com/adoptium/aqa-tests/blob/d5e241df7fe9a7648520efe4ebe1defc5fa5fd7d/openjdk/playlist.xml#L1832, replace $(JTREG_BASIC_OPTIONS)
with $(JTREG_BASIC_OPTIONS_WO_EXTRA_OPTS) $(SPECIAL_CONCURRENCY)
Thanks @llxia. I opened https://github.com/adoptium/aqa-tests/pull/4119.
Launched the internal Grinder job job/Grinder/29324/
with that change.
Launched the internal Grinder job job/Grinder/29324/ with that change.
Grinder/29324 passed 8 times. It failed because FlowNode was not found
at the end, which is unrelated to the change.
Re-launched the job. (job/Grinder/29389/
)
Grinder/29389 passed 13 times. It failed due to FlowNode was not found
error again.
Re-launched the Grinder. 20 iterations passed. (/job/Grinder/29428/
)
@Akira1Saitoh Do you think this will be resolved within 2 weeks?
@tajila https://github.com/adoptium/aqa-tests/pull/4119 was already merged. Since then, I have not seen this failure in several Grinder runs. I believe this issue can be closed now. To address the excessive memory consumption when running jdk_vector_0, I will open another issue. JIT uses up the scratch memory (1.5GB) when compiling some methods in the test.
I opened https://github.com/eclipse-openj9/openj9/issues/16309.
@Akira1Saitoh once https://github.com/eclipse-openj9/openj9/issues/16309 is resolved, will we revert https://github.com/adoptium/aqa-tests/pull/4119?
@llxia yes, but I do not think we can resolve #16309 anytime soon.
Can this issue (#15268) be closed now given we have #16309 ?
With the test case change, it sounds like this issue should be closed.