adoptium-support icon indicating copy to clipboard operation
adoptium-support copied to clipboard

SIGSEGV at ModuleEntryTable::purge_all_module_reads() called from G1ConcurrentMark

Open piotr-skalmierski opened this issue 4 years ago • 10 comments

Summary

Running AdoptOpenJDK 11.0.10+9 application crashes randomly with SIGSEGV (0xb) error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb83ec2bfe3, pid=12547, tid=12555
#
# JRE version: OpenJDK Runtime Environment AdoptOpenJDK (11.0.10+9) (build 11.0.10+9)
# Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (11.0.10+9, mixed mode, tiered, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xbf5fe3]  ModuleEntryTable::purge_all_module_reads()+0x163
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://github.com/AdoptOpenJDK/openjdk-support/issues
#

Host: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz, 16 cores, 124G, CentOS Linux release 7.9.2009 (Core) Time: Mon Jun 7 01:33:07 2021 CDT elapsed time: 290185.968425 seconds (3d 8h 36m 25s)

--------------- T H R E A D ---------------

Current thread (0x000055fff9b22000): VMThread "VM Thread" [stack: 0x00007fa0c3f19000,0x00007fa0c4019000] [id=12555]

Stack: [0x00007fa0c3f19000,0x00007fa0c4019000], sp=0x00007fa0c40170a0, free space=1016k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xbf5fe3] ModuleEntryTable::purge_all_module_reads()+0x163 V [libjvm.so+0x60bcb4] ClassLoaderDataGraph::do_unloading(bool)+0x134 V [libjvm.so+0xe80e58] SystemDictionary::do_unloading(GCTimer*, bool)+0x1a8 V [libjvm.so+0x7ac2f0] G1ConcurrentMark::weak_refs_work(bool)+0x410 V [libjvm.so+0x7af4d8] G1ConcurrentMark::remark()+0xd8 V [libjvm.so+0xf3a891] VM_CGC_Operation::doit()+0x221 V [libjvm.so+0xf323f7] VM_Operation::evaluate()+0xe7 V [libjvm.so+0xf3877f] VMThread::evaluate_operation(VM_Operation*) [clone .constprop.66]+0xff V [libjvm.so+0xf38cf8] VMThread::loop()+0x428 V [libjvm.so+0xf39193] VMThread::run()+0x73 V [libjvm.so+0xebd9bf] Thread::call_run()+0x14f V [libjvm.so+0xc5dbde] thread_native_entry(Thread*)+0xee

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000

Steps to reproduce

The problem is random and occurred twice in last three days in AWS hosted environment. The problem is similar to https://bugs.openjdk.java.net/browse/JDK-8251945, which was included in oracle 11.0.10 release and is available on http://hg.openjdk.java.net/jdk/jdk branch. We think merging it to AdoptOpentJDK should solve the issue. Can you confirm the theory and merge the changes?

Triaging info

Java version: openjdk version "11.0.10" 2021-01-19 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9) OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)

What is your operating system and platform? CentOS Linux release 7.9.2009 (Core)

How did you install Java? Used a binary archive (tar.gz)

Did it work before? It crashed in prod env, worked in lower envs. The load was low. It was running fine with Oracle JDK 1.8.0_92

Did you test with other Java versions? No, it's random with no reproduction pattern till now.

piotr-skalmierski avatar Jun 08 '21 16:06 piotr-skalmierski

@piotr-skalmierski Have you tried our 11 nightly builds?

karianna avatar Jun 08 '21 16:06 karianna

This issue is random on prod env. It does not happen on lower envs so it's hard to test. Is the nightly build from JDKUpdates/JDK11u branch? If so, I don't see changes to improve ModuleEntry table access on this branch. I'd like to give it a try but think it has better chance of success with https://bugs.openjdk.java.net/browse/JDK-8251945 merged.

piotr-skalmierski avatar Jun 08 '21 16:06 piotr-skalmierski

@piotr-skalmierski Ah yes, you'll have to wait until ljdk11u-dev is merged into 11u (July timeframe).

karianna avatar Jun 08 '21 17:06 karianna

Ah yes, you'll have to wait until ljdk11u-dev is merged into 11u (July timeframe).

I'm not sure how that's going to help (any merge). https://bugs.openjdk.java.net/browse/JDK-8251945 is only fixed in 11.0.10-oracle. I.e. the Oracle private fork. I don't see this fixed anywhere in OpenJDK 11. The reproducer from the bug crashes with latest 11.0.13-dev (current git jdk11u-dev tree):

$./build/linux-x86_64-normal-server-release/images/jdk/bin/java Test
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb547dbfe4a, pid=31021, tid=31028
#
# JRE version: OpenJDK Runtime Environment (11.0.13) (build 11.0.13-internal+0-adhoc.sgehwolf.jdk11u-dev)
# Java VM: OpenJDK 64-Bit Server VM (11.0.13-internal+0-adhoc.sgehwolf.jdk11u-dev, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc0ae4a]  PackageEntry::purge_qualified_exports() [clone .part.0]+0x12a
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/sgehwolf/Documents/openjdk/bugs/upstream/JDK-8251945-segv_modules_jfr/core.31021)
#
# An error report file with more information is saved as:
# /home/sgehwolf/Documents/openjdk/bugs/upstream/JDK-8251945-segv_modules_jfr/hs_err_pid31021.log
#
# If you would like to submit a bug report, please visit:
#   https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=java-11-openjdk&version=32
#
Aborted (core dumped)

jerboaa avatar Jun 09 '21 12:06 jerboaa

@jerboaa Are you able to create the JBS issue for an OpenJDK patch?

karianna avatar Jun 09 '21 13:06 karianna

@karianna We could probably use JDK-8251945

jerboaa avatar Jun 09 '21 15:06 jerboaa

@karianna I see the issue was added to June 2021 milestone. Do I understand correctly that it will be available in 11.0.12 release?

piotr-skalmierski avatar Jun 11 '21 12:06 piotr-skalmierski

Will be in the July PSU (I just haven't set that milestone yet).

karianna avatar Jun 11 '21 14:06 karianna

@karianna Can you confirm the fix availability as I see JDK-8251945 backport jira JDK-8269082 has the Fix Version/s: 11.0.13. To my understanding its official GA date is October 2021. I'm confused about the previously mentioned July date. Can you confirm when the fix will be available? Thanks, Piotr

piotr-skalmierski avatar Jul 15 '21 16:07 piotr-skalmierski

@piotr-skalmierski You are correct it will be the October PSU (it just missed the window for the July code freeze).

karianna avatar Jul 15 '21 16:07 karianna