openj9
openj9 copied to clipboard
openj9 21.0.7 oom
After upgrading openj9 from 21.0.3 to 21.0.7, the application will inexplicably experience OOM. During concurrent thread operations, the size of Concurrent HashMap suddenly increases from 1 to the maximum value for shaping. The code Sets.newConcurrentHashSet() is an implementation of guava, and the underlying layer uses Concurrent HashMap The HashMap used at the bottom layer of Sets.newHashSet (allocStacks) is OOM because the size reaches the maximum shaping value, resulting in the size of newHashSet also being the maximum shaping value. why the size of Concurrent HashMap suddenly increases from 1 to its maximum value during concurrent thread operations This only occurs when using shared class cache, and if the shared class cache is removed, it will not occur.
ConnectionAllocMonitor:
private Set<ConnectionAllocStack> allocStacks = Sets.newConcurrentHashSet();
private long lastOutTime = 0;
public void addConnectionAllocStack(ConnectionAllocStack allocStack){
allocStacks.add(allocStack);
logger.info("====" + this + ": " + allocStacks.size());
if(allocStacks.size()>=poolMaxSize&){
Set<ConnectionAllocStack> tempallocStacks = Sets.newHashSet(allocStacks);
logger.info(">>>>>>>>> connection pool is full ,output statck info:");
tempallocStacks.stream().forEach(stack->{
});
logger.info(">>>>>>>>> end connection output statck");
lastOutTime= System.currentTimeMillis();
}
}
log: 2025-05-30 09:56:14 453 INFO [com.tmp.db.jdbi.ConnectionAllocMonitor][Thread-9] - ====com.tmp.db.jdbi.ConnectionAllocMonitor@234b0671: 1 2025-05-30 09:56:14 462 INFO [com.tmp.db.jdbi.ConnectionAllocMonitor][Thread-9] - ====com.tmp.db.jdbi.ConnectionAllocMonitor@556d58e7: 1 2025-05-30 09:56:14 652 INFO [com.tmp.db.jdbi.ConnectionAllocMonitor][Thread-9] - ====com.tmp.db.jdbi.ConnectionAllocMonitor@556d58e7: 1 2025-05-30 09:56:21 946 INFO [com.tmp.db.jdbi.ConnectionAllocMonitor][app-executor-lv:1-idx:1] - ====com.tmp.db.jdbi.ConnectionAllocMonitor@556d58e7: 2147483647 2025-05-30 09:56:21 946 INFO [com.tmp.db.jdbi.ConnectionAllocMonitor][app-executor-lv:1-idx:3] - ====com.tmp.db.jdbi.ConnectionAllocMonitor@556d58e7: 2147483647 2025-05-30 09:56:21 947 INFO [com.tmp.db.jdbi.ConnectionAllocMonitor][app-executor-lv:1-idx:6] - ====com.tmp.db.jdbi.ConnectionAllocMonitor@556d58e7: 2147483647
Does using the shared class cache with the option -Xnoaot resolve the problem?
Also as a separate test, pls delete the existing shared class cache, and then run with -XX:-ShareOrphans along with a shared class cache to see if that resolves the problem.
Does using the shared class cache with the option
-Xnoaotresolve the problem?使用带有选项-Xnoaot的共享类高速缓存是否可以解决问题?
Can solve the problem
Also as a separate test, pls delete the existing shared class cache, and then run with
-XX:-ShareOrphansalong with a shared class cache to see if that resolves the problem.
Not using shared class cache will not result in OOM, but once added, it will result in OOM. What is the reason for OOM?
If using -Xnoaot solves the problem, then there is an issue with the AOT code in the share cache.
Is it possible to get a reproducible test case?
@hzongaro fyi
If using
-Xnoaotsolves the problem, then there is an issue with the AOT code in the share cache.Is it possible to get a reproducible test case?
@hzongaro fyi
i cannot reproduce the test cases.My usage scenario is that the class cache is created in Docker image construction and used directly in the business container. If the class cache is regenerated in the business container and reused, it will not cause OOM.
Through exclusion method, it was found that it was caused by this submission https://github.com/eclipse-openj9/openj9/pull/20937/commits/83b0f899e5cf2fbd2bc26fdafc1b1df696f7a8ee @pshipton
This is https://github.com/eclipse-openj9/openj9/pull/20937
What platform are you running on?
@keithc-ca fyi
What's the progress on this issue? I also encountered this problem
@DHbigfart pls confirm which platform and version you are running on.
@DHbigfartpls confirm which platform and version you are running on.
<attribute name="gcPolicy" value="-Xgcpolicy:gencon" />
<attribute name="maxHeapSize" value="0x180000000" />
<attribute name="initialHeapSize" value="0x60000000" />
<attribute name="compressedRefs" value="true" />
<attribute name="compressedRefsDisplacement" value="0x0" />
<attribute name="compressedRefsShift" value="0x3" />
<attribute name="pageSize" value="0x1000" />
<attribute name="pageType" value="not used" />
<attribute name="requestedPageSize" value="0x1000" />
<attribute name="requestedPageType" value="not used" />
<attribute name="gcthreads" value="4" />
<attribute name="gcthreads Concurrent Mark" value="1" />
<attribute name="packetListSplit" value="1" />
<attribute name="cacheListSplit" value="1" />
<attribute name="splitFreeListSplitAmount" value="1" />
<attribute name="numaNodes" value="0" />
<system>
<attribute name="physicalMemory" value="8589934592" />
<attribute name="addressablePhysicalMemory" value="8589934592" />
<attribute name="container memory limit set" value="true" />
<attribute name="numCPUs" value="8" />
<attribute name="numCPUs active" value="4" />
<attribute name="architecture" value="amd64" />
<attribute name="os" value="Linux" />
<attribute name="osVersion" value="5.10.134-13.1.zncgsl6.x86_64" />
</system>
<vmargs>
<vmarg name="-Xlockword:mode=default,noLockword=java/lang/String,noLockword=java/util/MapEntry,noLockword=java/util/HashMap$Entry,noLockword..." />
<vmarg name="-XX:+EnsureHashed:java/lang/Class,java/lang/Thread" />
<vmarg name="-Xjcl:jclse29" />
<vmarg name="-Djava.class.path=." />
<vmarg name="-Xms1536m" />
<vmarg name="-Xmx6144m" />
<vmarg name="-Xdump:heap:events=systhrow+user,filter=java/lang/OutOfMemoryError,request=exclusive+prepwalk+compact,label=/home/zenap/dump/du..." />
<vmarg name="-Xdump:none" />
<vmarg name="-Xdump:system:events=gpf+abort+traceassert,range=1..0,priority=999,request=serial,label=/home/zenap/dump/core-dump-2025-05-27-0..." />
<vmarg name="-Xdump:heap:events=systhrow,filter=java/lang/OutOfMemoryError,range=1..1,priority=500,request=exclusive+compact+prepwalk,label=..." />
<vmarg name="-Xdump:heap:events=user,priority=500,request=exclusive+compact+prepwalk,label=/home/zenap/dump/dump-dump-user-2025-05-27-01-47-..." />
<vmarg name="-Xdump:java:events=systhrow,filter=java/lang/OutOfMemoryError,range=1..1,priority=400,request=exclusive+preempt,label=/home/zen..." />
<vmarg name="-Xdump:java:events=gpf+abort+traceassert+user,priority=400,request=exclusive+preempt,label=/home/zenap/dump/javacore-dump-2025-..." />
<vmarg name="-Xdump:snap:events=systhrow,filter=java/lang/OutOfMemoryError,range=1..1,priority=300,request=serial,label=/home/zenap/dump/sna..." />
<vmarg name="-Xdump:snap:events=gpf+abort+traceassert,priority=300,request=serial,label=/home/zenap/dump/snap-dump-2025-05-27-01-47-29.%seq...." />
<vmarg name="-Xverbosegclog:/home/zenap/gclog/gc-2025-05-27-01-47-29.log,1,10000" />
<vmarg name="-Xquickstart" />
<vmarg name="-Dfile.encoding=UTF-8" />
<vmarg name="-Xlp:objectheap:pagesize=4K" />
<vmarg name="-Xlp:codecache:pagesize=4K" />
<vmarg name="-XX:+UseContainerSupport" />
<vmarg name="-Xshareclasses:name=ZenapWarmCache,cacheDir=/sharedclasscache,readonly" />
<vmarg name="-Dotel.metrics.exporter=none" />
<vmarg name="-Xdump:tool:events=systhrow,opts=ASYNC,filter=java/lang/OutOfMemoryError,exec=sleep 120s && kill %pid && sleep ..." />
<vmarg name="-Dsun.java.launcher=SUN_STANDARD" />
<vmarg name="-Dsun.java.launcher.pid=1" />
</vmargs>
To proceed we need either a system core file produced at the time of the OOM, or a reproducible test case. Even a javacore file would be helpful.
Moving this out for now until we have a core file or some other dump file to work.