[GR-58225] Internal Error guarantee failed: "can not load classes with compiler thread" when using NFI Panama
Describe GraalVM and your environment :
- GraalVM version or commit id if built from source: 07d876aff9cb05558784449d0e6b8739aaac9fee
- TruffleRuby version: truffleruby 24.2.0-dev-37fdafbf, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
- CE or EE: EE/GFTC
- JDK version: JDK 24
- OS and OS Version: Ubuntu 24.04
- Architecture: amd64
- The output of
java -Xinternalversion:
Java HotSpot(TM) 64-Bit Server VM (24+13-jvmci-b01) for linux-amd64 JRE (24+13-jvmci-b01), built on 2024-08-29T16:25:44Z with gcc 13.2.0
Have you verified this issue still happens when using the latest snapshot? Yes. I only see the issue with snapshots since Puma support isn't yet available in a release.
Describe the issue
While running a Rails application with an GFTC EA snapshot of TruffleRuby with the Panama backend enabled I sometimes see the JVM crash. Unfortunately, since this happens during application boot, I don't have the hs_err log. I'm trying to work with our infrastructure team to preserve this. Currently, when it crashes the deployment is halted and the container used is immediately discarded. I do, however, have a copy of the core dump, but that is too large to attach to the issue.
Code snippet or code repository that reproduces the issue
Puma starting in single mode...
* Puma version: 6.4.2 (truffleruby 24.2.0-dev-37fdafbf - ruby 3.2.4) ("The Eagle of Durango")
* Min threads: 120
* Max threads: 120
* Environment: staging
* PID: 91
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (systemDictionary.cpp:631), pid=91, tid=111
# guarantee(THREAD->can_call_java()) failed: can not load classes with compiler thread: class=com/oracle/truffle/nfi/backend/panama/NativePointer, classloader=jdk/internal/loader/ClassLoaders$AppClassLoader
#
# JRE version: Java(TM) SE Runtime Environment Oracle GraalVM 24-dev+13.1 (24.0+13) (build 24+13-jvmci-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 24-dev+13.1 (24+13-jvmci-b01, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xe90c86][thread 110 also had an error]
SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, JavaThread*)+0x776
#
# Core dump will be written. Default location: Core dumps may be processed with "/var/lib/toolbox/crash-reporter/crash-reporter-binary %p %P %s %E" (or dumping to /app/core.91)
#
# An error report file with more information is saved as:
# /app/hs_err_pid91.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
Steps to reproduce the issue Please include both build steps as well as run steps
- Install the latest TruffleRuby GFTC EA build (e.g., with rbenv + ruby-build it would be
rbenv install truffleruby+graalvm-dev - Enable the Panama backend
export TRUFFLERUBYOPT="--experimental-options --ruby.cexts-panama" - Boot an application using native extensions
Unfortunately, the crash doesn't occur reliably. Sometimes I get an exception instead, which I think is related to a known issue with the propagation of errno
Expected behavior I'd expect the application to behave functionally the same both with and without Panama enabled.
Filed internally as GR-58225.
@nirvdrum After a quick look it would be really helpful or even necessary to get the hs_err, could you try to get it?
@dougxc told me -XX:LogFile (--vm.XX:LogFile=path as a truffleruby arg) can be used to put the hs_err log anywhere.
I'm trying to get the hs_err log, but I'm still running into the limitation I mentioned in the issue description. Unfortunately, it doesn't really matter where I write the file as I don't have the means to mount a volume so the file goes away when the container is discarded. We have a crash reporting service based on core_pattern that scans for hs_err and uploads them to a bucket, but it hasn't picked up these files. I'm trying to debug that, but it's a very slow process. I haven't yet been able to reproduce locally.
I'm not sure if it works, but maybe you could try -XX:LogFile=/dev/stdout.