graal icon indicating copy to clipboard operation
graal copied to clipboard

[Native Image] G1 GC application crashes with segfault after a second heap dump using `kill -SIGUSR1 <pid>`

Open ThoSap opened this issue 1 year ago • 1 comments

Describe the Issue

If I follow the following steps to create a heap dump using kill -SIGUSR1 <pid> on runtime, my application running with the G1 GC always crashes with a segfault after I make the second heap (after some time in between, for example, 30 minutes) dump using the command. https://www.graalvm.org/jdk21/reference-manual/native-image/guides/create-heap-dump/#:~:text=Create%20Heap%20Dumps%20with%20SIGUSR1%20(Linux/macOS%20only)

I added -XX:+PrintGC -XX:+VerboseGC after this issue occurred for the first time, to see normal GC runs in runtime before the full GC heap dump crash.

Using the latest version of GraalVM can resolve many issues.

Latest JDK 21.0.5 version

GraalVM Version

Built using container image container-registry.oracle.com/graalvm/native-image:21.0.5 -> container-registry.oracle.com/graalvm/native-image:21.0.5-ol9-20241015 -> Oracle Container Registry Image ID 1e7548c6ff98 -> Repo Digest container-registry.oracle.com/graalvm/native-image@sha256:c10d7f10da5bfed22ad02887e96095b1c08d42c8e6bf03d774cded332fa444a9 -> Image ID (output from docker images) sha256:f8a4dcaa07bfa79d6e8f4dced149d28a86e802fcdfe20586b69dd93570c2e82d

java version "21.0.5" 2024-10-15 LTS
Java(TM) SE Runtime Environment Oracle GraalVM 21.0.5+9.1 (build 21.0.5+9-LTS-jvmci-23.1-b48)
Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 21.0.5+9.1 (build 21.0.5+9-LTS-jvmci-23.1-b48, mixed mode, sharing)

Operating System and Version

Linux mycontainername 5.15.0-209.161.7.2.el8uek.x86_64 #2 SMP Tue Aug 20 10:44:07 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Diagnostic Flag Confirmation

  • [ ] I tried the -H:ThrowMissingRegistrationErrors= flag.

Run Command

./application -Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -XX:+PrintGC -XX:+VerboseGC

Expected Behavior

I should be able to create multiple heap dumps on runtime using kill -SIGUSR1 <pid> without a segfault.

Actual Behavior

Executing kill -SIGUSR1 <pid> twice (after some time in between, like 30 minutes) crashes the application with a segfault.

Steps to Reproduce

My application was built with Quarkus 3.15.1 and container-registry.oracle.com/graalvm/native-image:21.0.5-ol9-20241015 with the following native-image build command. I enabled the heapdump using --enable-monitoring=heapdump,jfr,jvmstat and created the heapdump using https://www.graalvm.org/jdk21/reference-manual/native-image/guides/create-heap-dump/#:~:text=Create%20Heap%20Dumps%20with%20SIGUSR1%20(Linux/macOS%20only)

/usr/lib64/graalvm/graalvm-java21/bin/native-image \
-J-Dsun.nio.ch.maxUpdateArraySize=100 \
-J-Dlogging.initial-configurator.min-level=500 \
-J-DCoordinatorEnvironmentBean.transactionStatusManagerEnable=false \
-J-Dio.quarkus.caffeine.graalvm.recordStats=true \
-J-Djava.util.logging.manager=org.jboss.logmanager.LogManager \
-J-Dvertx.logger-delegate-factory-class-name=io.quarkus.vertx.core.runtime.VertxLogDelegateFactory \
-J-Dvertx.disableDnsResolver=true \
-J-Dio.netty.leakDetection.level=DISABLED \
-J-Dio.netty.allocator.maxOrder=3 \
-J-Duser.language=en \
-J-Duser.country=US \
-J-Dfile.encoding=UTF-8 \
--features=io.quarkus.runner.Feature,io.quarkus.runtime.graal.DisableLoggingFeature,oracle.jdbc.nativeimage.NativeImageFeature,io.quarkus.caffeine.runtime.graal.CacheConstructorsFeature,io.quarkus.jdbc.postgresql.runtime.graal.SQLXMLFeature \
-J--add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED \
-J--add-exports=java.security.jgss/sun.security.jgss=ALL-UNNAMED \
-J--add-opens=java.base/java.text=ALL-UNNAMED \
-J--add-opens=java.base/java.io=ALL-UNNAMED \
-J--add-opens=java.base/java.lang.invoke=ALL-UNNAMED \
-J--add-opens=java.base/java.util=ALL-UNNAMED \
-H:+UnlockExperimentalVMOptions \
-H:BuildOutputJSONFile=myapi-4.20.1-runner-build-output-stats.json \
-H:-UnlockExperimentalVMOptions \
-H:+UnlockExperimentalVMOptions \
-H:+GenerateBuildArtifactsFile \
-H:-UnlockExperimentalVMOptions \
--strict-image-heap \
--color=always \
-march=native \
--enable-sbom \
--gc=G1 \
--initialize-at-run-time=io.trino.jdbc.TrinoDriver \
-H:+UnlockExperimentalVMOptions \
-H:+AllowFoldMethods \
-H:-UnlockExperimentalVMOptions \
-J-Djava.awt.headless=true \
--no-fallback \
-H:+UnlockExperimentalVMOptions \
-H:+ReportExceptionStackTraces \
-H:-UnlockExperimentalVMOptions \
-J-Xmx18g \
-H:+AddAllCharsets \
--enable-url-protocols=http,https \
-H:NativeLinkerOption=-no-pie \
--enable-monitoring=heapdump,jfr,jvmstat \
-H:+UnlockExperimentalVMOptions \
-H:-UseServiceLoaderFeature \
-H:-UnlockExperimentalVMOptions \
-J--add-exports=org.graalvm.nativeimage/org.graalvm.nativeimage.impl=ALL-UNNAMED \
--exclude-config \
com\.oracle\.database\.jdbc \
/META-INF/native-image/native-image\.properties \
--exclude-config \
com\.oracle\.database\.jdbc \
/META-INF/native-image/reflect-config\.json \
--exclude-config \
io\.netty\.netty-codec \
/META-INF/native-image/io\.netty/netty-codec/generated/handlers/reflect-config\.json \
--exclude-config \
io\.netty\.netty-handler \
/META-INF/native-image/io\.netty/netty-handler/generated/handlers/reflect-config\.json \
myapi-4.20.1-runner \
-jar \
myapi-4.20.1-runner.jar

The native-image is running in the base container image registry.access.redhat.com/ubi9/ubi-minimal:9.4.

Additional Context

No response

Run-Time Log Output and Error Messages

https://gist.github.com/ThoSap/e6b01fe3677c2e89a3716a1ac66e85eb

ThoSap avatar Oct 16 '24 15:10 ThoSap

To clarify, if I make a heap dump immediately after the first one or only with a few minutes in between, the application does not crash. It only crashes if I make a heap dump after 20 to 30 minutes of runtime.

ThoSap avatar Oct 17 '24 07:10 ThoSap

Hi @ThoSap,

Thank you for reaching out to us! Could you please provide us with a concise reproducer that I can test locally on my machine for your issue alongside the steps needed to reproduce it?

selhagani avatar Oct 21 '24 15:10 selhagani

As I haven't heard back from you in over three weeks, I will be closing this issue for now. If you need further assistance or have any updates, please feel free to reach out at any time. Thank you for your understanding.

selhagani avatar Nov 28 '24 09:11 selhagani