openj9
openj9 copied to clipboard
openj9 java11 AsyncGetCallTrace segfault, OOM, assert
Java -version output
openjdk version "11.0.12" 2021-07-20 IBM Semeru Runtime Open Edition 11.0.12.0 (build 11.0.12+7) Eclipse OpenJ9 VM 11.0.12.0 (build openj9-0.27.0, JRE 11 Linux amd64-64-Bit Compressed References 20210729_228 (JIT enabled, AOT enabled) OpenJ9 - 1851b0074 OMR - 9db1c870d JCL - 21849e2ca0 based on jdk-11.0.12+7)
Summary of problem
In our customer env, they encountered "java/lang/OutOfMemoryError" "native memory exhausted"
. It happened 13 times in 8 hours.
Java process is running in container in AWS EKS cluster.
Diagnostic files
There are jitdump, javacore, heapdum in IBM Box https://ibm.box.com/s/3ej7gzrpevtnyrs5hyj7bxtk5w40irfp NATIVEMEMINFO, JIT used a lot of memory.
2MEMUSER +--JIT: 278,216,584 bytes / 352 allocations
2MEMUSER | |
3MEMUSER | +--JIT Code Cache: 268,435,456 bytes / 1 allocation
2MEMUSER | |
3MEMUSER | +--JIT Data Cache: 2,097,216 bytes / 1 allocation
2MEMUSER | |
3MEMUSER | +--Other: 7,683,912 bytes / 350 allocations
MEMINFO:
1STSEGTYPE JIT Code Cache
NULL segment start alloc end type size
1STSEGMENT 0x00007F9CC415CCD8 0x00007F9CA8F9E000 0x00007F9CA90835B0 0x00007F9CB8F9E000 0x00000068 0x0000000010000000
NULL
1STSEGTOTAL Total memory: 268435456 (0x0000000010000000)
1STSEGINUSE Total memory in use: 939440 (0x00000000000E55B0)
1STSEGFREE Total memory free: 267496016 (0x000000000FF1AA50)
1STSEGLIMIT Allocation limit: 268435456 (0x0000000010000000)
NULL
1STSEGTYPE JIT Data Cache
NULL segment start alloc end type size
1STSEGMENT 0x00007F9CC415D168 0x00007F9CA2DFF030 0x00007F9CA2FFF030 0x00007F9CA2FFF030 0x00000048 0x0000000000200000
NULL
1STSEGTOTAL Total memory: 2097152 (0x0000000000200000)
1STSEGINUSE Total memory in use: 2097152 (0x0000000000200000)
1STSEGFREE Total memory free: 0 (0x0000000000000000)
1STSEGLIMIT Allocation limit: 402653184 (0x0000000018000000)
Error message
We also observed some message in console when java crashed
2022-07-25T14:09:24Z [queryScriptsThread-1] INFO TestListLoader:queryTests - X-TransactionID: 09a08c25-5ffa-4c6b-8ade-239f9931e537, QueryTime From: 1658757984432, UTC Time: 2022-07-25T14:06:24.432
Unhandled exception
Type=Segmentation error vmState=0x0002000f
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FDDBCFA3DD0 Handler2=00007FDDBCD03AE0 InaccessibleAddress=0000003000000028
RDI=00007FDD9CF0D9E0 RSI=0000000024600002 RAX=0000003000000020 RBX=00007FDD85698910
RCX=00007FDD9CF0DA00 RDX=0000000000000000 R8=0000000000000000 R9=0000000000000000
R10=00007FDDB6D5FE00 R11=00007FDD103B7899 R12=0000000000000000 R13=000000000086FD00
R14=0000000004600002 R15=00007FDDB90E7018
RIP=00007FDDBCFE2C3A GS=0000 FS=0000 RSP=00007FDD85698890
EFlags=0000000000010202 CS=0033 RBP=0000000000000000 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000003000000028
xmm0 0000000300000003 (f: 3.000000, d: 6.365987e-314)
xmm1 0000000000000003 (f: 3.000000, d: 1.482197e-323)
xmm2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7 0000000000000094 (f: 148.000000, d: 7.312172e-322)
xmm8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm11 ca62c1d6ca62c1d6 (f: 3395469824.000000, d: -2.193092e+50)
xmm12 0000000000000001 (f: 1.000000, d: 4.940656e-324)
xmm13 08090a0b0c0d0e0f (f: 202182160.000000, d: 5.924543e-270)
xmm14 597a2efb7188e8a2 (f: 1904797824.000000, d: 1.081798e+123)
xmm15 35a343df8901ff20 (f: 2298609408.000000, d: 2.574558e-50)
Module=/opt/ibm/openjdk/lib/default/libj9vm29.so
Module_base_address=00007FDDBCF65000
Target=2_90_20210729_228 (Linux 5.15.32-flatcar)
CPU=amd64 (32 logical CPUs) (0x1ec389f000 RAM)
----------- Stack Backtrace -----------
(0x00007FDDBCFE2C3A [libj9vm29.so+0x7dc3a])
(0x00007FDDB6D5FEC6 [libj9gc29.so+0x41ec6])
(0x00007FDDB6D57D9D [libj9gc29.so+0x39d9d])
(0x00007FDDB6D56AAF [libj9gc29.so+0x38aaf])
(0x00007FDDB6D59622 [libj9gc29.so+0x3b622])
(0x00007FDDB6EB248B [libj9gc29.so+0x19448b])
(0x00007FDDB6E4FA1E [libj9gc29.so+0x131a1e])
(0x00007FDDB6E8393D [libj9gc29.so+0x16593d])
(0x00007FDDB6EB6D35 [libj9gc29.so+0x198d35])
(0x00007FDDB6EB8A70 [libj9gc29.so+0x19aa70])
(0x00007FDDB6E30566 [libj9gc29.so+0x112566])
(0x00007FDDB6F163FD [libj9gc29.so+0x1f83fd])
(0x00007FDDB6EF44D8 [libj9gc29.so+0x1d64d8])
(0x00007FDDB6E54795 [libj9gc29.so+0x136795])
(0x00007FDDB6E54966 [libj9gc29.so+0x136966])
(0x00007FDDB6E53AB5 [libj9gc29.so+0x135ab5])
(0x00007FDDB6E5956B [libj9gc29.so+0x13b56b])
(0x00007FDDB6D65789 [libj9gc29.so+0x47789])
(0x00007FDDBD00883B [libj9vm29.so+0xa383b])
(0x00007FDDBCFF4EF5 [libj9vm29.so+0x8fef5])
(0x00007FDDBD0A0192 [libj9vm29.so+0x13b192])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2022/07/25 14:09:24 - please wait.
Do you have any system core files? I don't think we can figure anything out from the diagnostic files already provided. If you have a system core, it needs to be processed with jextract or jpackcore (https://www.eclipse.org/openj9/docs/tool_jextract/)
Is it possible to use the latest JVM? Which may provide more information about the native stack, assuming the problem still occurs when using the latest. https://developer.ibm.com/languages/java/semeru-runtimes/downloads/?version=11
@pshipton I uploaded a zip file including all of the jitdump, headump and javacore https://ibm.box.com/s/arhondqtf9iodxmvlyfah8ewbvrrep1u. We did not see core dumps. We are using -XX:+HeapDumpOnOutOfMemory -XX:HeapDumpPath=xxx
.
Another thing strange is that in Current thread
, I see Unable to walk in-flight data on call stack
Unable to obtain lock context information
as below.
1XMCURTHDINFO Current thread
3XMTHREADINFO "testScheduleThread-1" J9VMThread:0x0000000000109400, omrthread_t:0x00007F69750F8D78, java/lang/Thread:0x00000000F85A1710, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x20, isDaemon:false)
3XMTHREADINFO1 (native thread ID:0x38, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00001020)
3XMTHREADINFO2 (native stack address range from:0x00007F691ABE0000, to:0x00007F691AC20000, size:0x40000)
3XMCPUTIME CPU usage total: 12.526471354 secs, current category="Application"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
1INTERNAL Unable to obtain lock context information
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/lang/Throwable.fillInStackTrace(Native Method)
4XESTACKTRACE at java/lang/Throwable.<init>(Throwable.java:90(Compiled Code))
4XESTACKTRACE at java/lang/Throwable.<init>(Throwable.java:101(Compiled Code))
4XESTACKTRACE at java/lang/Exception.<init>(Exception.java:66(Compiled Code))
4XESTACKTRACE at java/lang/RuntimeException.<init>(RuntimeException.java:62(Compiled Code))
1INTERNAL Unable to walk in-flight data on call stack
A heapdump isn't useful for "native memory exhausted"
. The heapdump contains the object heap, but the OOM is for native memory, which is outside of the object heap.
Besides the "java/lang/OutOfMemoryError" "native memory exhausted"
and gpf, I also see an assert openj9/runtime/util/mthutil.c:600: ((((uintptr_t)-1) != methodIndex))
.
Plus I notice this has something to do with AsyncGetCallTrace. The 0.27 release is too old, there are fixes related to AsyncGetCallTrace in newer versions (0.31 and later), such as https://github.com/eclipse-openj9/openj9/pull/14308 which fixes https://github.com/eclipse-openj9/openj9/issues/13838. Pls update to the latest version. If there are still problems with the newer versions, we'll need to figure out how to get a system core file.
This is the decoded native stack trace for the GPF:
` CPU=amd64 (32 logical CPUs) (0x1ec389f000 RAM) ----------- Stack Backtrace ----------- (0x00007FDDBCFE2C3A [libj9vm29.so+0x7dc3a]) : 0x7dc3a <walkStackFrames+1386> [../../../../../openj9/runtime/vm/swalk.c:1134]
(0x00007FDDB6D5FEC6 [libj9gc29.so+0x41ec6]) : 0x41ec6 <GC_VMThreadStackSlotIterator [../../../../../openj9/runtime/gc_structs/VMThreadStackSlotIterator.cpp:115]
(0x00007FDDB6D57D9D [libj9gc29.so+0x39d9d]) : 0x39d9d <MM_RootScanner [../../../../../openj9/runtime/gc_base/RootScanner.cpp:521]
(0x00007FDDB6D56AAF [libj9gc29.so+0x38aaf]) : 0x38aaf <MM_RootScanner [../../../../../openj9/runtime/gc_base/RootScanner.cpp:488]
(0x00007FDDB6D59622 [libj9gc29.so+0x3b622]) : 0x3b622 <MM_RootScanner [../../../../../openj9/runtime/gc_base/RootScanner.cpp:921]
(0x00007FDDB6EB248B [libj9gc29.so+0x19448b]) : 0x19448b <MM_Scavenger [../../../../../../openj9/runtime/gc_glue_java/ScavengerRootScanner.hpp:209]
(0x00007FDDB6E4FA1E [libj9gc29.so+0x131a1e]) : 0x131a1e <MM_ParallelDispatcher [../../../../../../omr/gc/base/ParallelDispatcher.cpp:589]
(0x00007FDDB6E8393D [libj9gc29.so+0x16593d]) : 0x16593d <MM_Scavenger [../../../../../../omr/gc/base/standard/Scavenger.cpp:581]
(0x00007FDDB6EB6D35 [libj9gc29.so+0x198d35]) : 0x198d35 <MM_Scavenger [../../../../../../omr/gc/base/standard/Scavenger.cpp:4199]
(0x00007FDDB6EB8A70 [libj9gc29.so+0x19aa70]) : 0x19aa70 <MM_Scavenger [../../../../../../omr/gc/base/standard/Scavenger.cpp:4729]
(0x00007FDDB6E30566 [libj9gc29.so+0x112566]) : 0x112566 <MM_Collector [../../../../../../omr/gc/base/Collector.cpp:502]
(0x00007FDDB6F163FD [libj9gc29.so+0x1f83fd]) : 0x1f83fd <MM_MemorySubSpaceSemiSpace [../../../../../../omr/gc/base/AllocateDescription.hpp:152]
(0x00007FDDB6EF44D8 [libj9gc29.so+0x1d64d8]) : 0x1d64d8 <MM_MemorySubSpaceGeneric [../../../../../../omr/gc/base/MemorySubSpaceGeneric.cpp:377]
(0x00007FDDB6E54795 [libj9gc29.so+0x136795]) : 0x136795 <MM_TLHAllocationSupport [../../../../../../omr/gc/base/TLHAllocationSupport.cpp:239]
(0x00007FDDB6E54966 [libj9gc29.so+0x136966]) : 0x136966 <MM_TLHAllocationSupport [../../../../../../omr/gc/base/TLHAllocationSupport.hpp:115]
(0x00007FDDB6E53AB5 [libj9gc29.so+0x135ab5]) : 0x135ab5 <MM_TLHAllocationInterface [../../../../../../omr/gc/base/TLHAllocationInterface.cpp:193]
(0x00007FDDB6E5956B [libj9gc29.so+0x13b56b]) : 0x13b56b <OMR_GC_AllocateObject(OMR_VMThread*, MM_AllocateInitialization*)+523> [../../../../../../omr/gc/base/AllocateInitialization.hpp:201]
(0x00007FDDB6D65789 [libj9gc29.so+0x47789]) : 0x47789 <J9AllocateObject(J9VMThread*, J9Class*, uintptr_t)+441> [../../../../../openj9/runtime/gc_modron_startup/mgcalloc.cpp:409]
(0x00007FDDBD00883B [libj9vm29.so+0xa383b]) : 0xa383b <VM_BytecodeInterpreterCompressed [../../../../../openj9/runtime/vm/BytecodeInterpreter.hpp:357]
(0x00007FDDBCFF4EF5 [libj9vm29.so+0x8fef5]) : 0x8fef5 <bytecodeLoopCompressed(J9VMThread*)+149> [../../../../../openj9/runtime/vm/BytecodeInterpreter.inc:113]
(0x00007FDDBD0A0192 [libj9vm29.so+0x13b192]) : 0x13b192 <c_cInterpreter+82> [/home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-x64-openj9/workspace/build/src/build/linux-x86_64-normal-server-release/vm/runtime/vm/xcinterp.s:161] ~ ` It means we found a bogus object pointer in Java stack (and likely need our JIT team to investigate further) and we need the system core file to get the details.
hello @pshipton and @manqingl, thanks for your analysis. Do you think we need to replace JVM arguments -XX:+HeapDumpOnOutOfMemory -XX:HeapDumpPath=xxx
with -Xdump:system+java+heap+jit:file=./logs
? Or any other arguments do you suggest use to collect enough info if it occurs again? Since it is a customer environment, I hope that we can provide one build with all the required updates, and also the latest JDK if you think it is required.
This looks like a duplicate of https://github.com/eclipse-openj9/openj9/issues/13838, which is fixed in 0.31 or later. Pls update the JVM. There is no point in trying to collect more diagnostics on a JVM which has this known problem.
OK, upgrade JDK to the latest.
openjdk version "11.0.15" 2022-04-19
IBM Semeru Runtime Open Edition 11.0.15.0 (build 11.0.15+10)
Eclipse OpenJ9 VM 11.0.15.0 (build openj9-0.32.0, JRE 11 Linux amd64-64-Bit Compressed References 20220422_425 (JIT enabled, AOT enabled)