SIGSEGV at ModuleEntryTable::purge_all_module_reads() called from G1ConcurrentMark
Summary
Running AdoptOpenJDK 11.0.10+9 application crashes randomly with SIGSEGV (0xb) error:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fb83ec2bfe3, pid=12547, tid=12555
#
# JRE version: OpenJDK Runtime Environment AdoptOpenJDK (11.0.10+9) (build 11.0.10+9)
# Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (11.0.10+9, mixed mode, tiered, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xbf5fe3] ModuleEntryTable::purge_all_module_reads()+0x163
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# https://github.com/AdoptOpenJDK/openjdk-support/issues
#
Host: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz, 16 cores, 124G, CentOS Linux release 7.9.2009 (Core) Time: Mon Jun 7 01:33:07 2021 CDT elapsed time: 290185.968425 seconds (3d 8h 36m 25s)
--------------- T H R E A D ---------------
Current thread (0x000055fff9b22000): VMThread "VM Thread" [stack: 0x00007fa0c3f19000,0x00007fa0c4019000] [id=12555]
Stack: [0x00007fa0c3f19000,0x00007fa0c4019000], sp=0x00007fa0c40170a0, free space=1016k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xbf5fe3] ModuleEntryTable::purge_all_module_reads()+0x163 V [libjvm.so+0x60bcb4] ClassLoaderDataGraph::do_unloading(bool)+0x134 V [libjvm.so+0xe80e58] SystemDictionary::do_unloading(GCTimer*, bool)+0x1a8 V [libjvm.so+0x7ac2f0] G1ConcurrentMark::weak_refs_work(bool)+0x410 V [libjvm.so+0x7af4d8] G1ConcurrentMark::remark()+0xd8 V [libjvm.so+0xf3a891] VM_CGC_Operation::doit()+0x221 V [libjvm.so+0xf323f7] VM_Operation::evaluate()+0xe7 V [libjvm.so+0xf3877f] VMThread::evaluate_operation(VM_Operation*) [clone .constprop.66]+0xff V [libjvm.so+0xf38cf8] VMThread::loop()+0x428 V [libjvm.so+0xf39193] VMThread::run()+0x73 V [libjvm.so+0xebd9bf] Thread::call_run()+0x14f V [libjvm.so+0xc5dbde] thread_native_entry(Thread*)+0xee
siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000
Steps to reproduce
The problem is random and occurred twice in last three days in AWS hosted environment. The problem is similar to https://bugs.openjdk.java.net/browse/JDK-8251945, which was included in oracle 11.0.10 release and is available on http://hg.openjdk.java.net/jdk/jdk branch. We think merging it to AdoptOpentJDK should solve the issue. Can you confirm the theory and merge the changes?
Triaging info
Java version: openjdk version "11.0.10" 2021-01-19 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9) OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)
What is your operating system and platform? CentOS Linux release 7.9.2009 (Core)
How did you install Java? Used a binary archive (tar.gz)
Did it work before? It crashed in prod env, worked in lower envs. The load was low. It was running fine with Oracle JDK 1.8.0_92
Did you test with other Java versions? No, it's random with no reproduction pattern till now.
@piotr-skalmierski Have you tried our 11 nightly builds?
This issue is random on prod env. It does not happen on lower envs so it's hard to test. Is the nightly build from JDKUpdates/JDK11u branch? If so, I don't see changes to improve ModuleEntry table access on this branch. I'd like to give it a try but think it has better chance of success with https://bugs.openjdk.java.net/browse/JDK-8251945 merged.
@piotr-skalmierski Ah yes, you'll have to wait until ljdk11u-dev is merged into 11u (July timeframe).
Ah yes, you'll have to wait until ljdk11u-dev is merged into 11u (July timeframe).
I'm not sure how that's going to help (any merge). https://bugs.openjdk.java.net/browse/JDK-8251945 is only fixed in 11.0.10-oracle. I.e. the Oracle private fork. I don't see this fixed anywhere in OpenJDK 11. The reproducer from the bug crashes with latest 11.0.13-dev (current git jdk11u-dev tree):
$./build/linux-x86_64-normal-server-release/images/jdk/bin/java Test
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fb547dbfe4a, pid=31021, tid=31028
#
# JRE version: OpenJDK Runtime Environment (11.0.13) (build 11.0.13-internal+0-adhoc.sgehwolf.jdk11u-dev)
# Java VM: OpenJDK 64-Bit Server VM (11.0.13-internal+0-adhoc.sgehwolf.jdk11u-dev, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xc0ae4a] PackageEntry::purge_qualified_exports() [clone .part.0]+0x12a
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/sgehwolf/Documents/openjdk/bugs/upstream/JDK-8251945-segv_modules_jfr/core.31021)
#
# An error report file with more information is saved as:
# /home/sgehwolf/Documents/openjdk/bugs/upstream/JDK-8251945-segv_modules_jfr/hs_err_pid31021.log
#
# If you would like to submit a bug report, please visit:
# https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=java-11-openjdk&version=32
#
Aborted (core dumped)
@jerboaa Are you able to create the JBS issue for an OpenJDK patch?
@karianna We could probably use JDK-8251945
@karianna I see the issue was added to June 2021 milestone. Do I understand correctly that it will be available in 11.0.12 release?
Will be in the July PSU (I just haven't set that milestone yet).
@karianna Can you confirm the fix availability as I see JDK-8251945 backport jira JDK-8269082 has the Fix Version/s: 11.0.13. To my understanding its official GA date is October 2021. I'm confused about the previously mentioned July date. Can you confirm when the fix will be available? Thanks, Piotr
@piotr-skalmierski You are correct it will be the October PSU (it just missed the window for the July code freeze).