graal [GR-58225] Internal Error guarantee failed: "can not load classes with compiler thread" when using NFI Panama

Describe GraalVM and your environment :

GraalVM version or commit id if built from source: 07d876aff9cb05558784449d0e6b8739aaac9fee
TruffleRuby version: truffleruby 24.2.0-dev-37fdafbf, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
CE or EE: EE/GFTC
JDK version: JDK 24
OS and OS Version: Ubuntu 24.04
Architecture: amd64
The output of java -Xinternalversion:

Java HotSpot(TM) 64-Bit Server VM (24+13-jvmci-b01) for linux-amd64 JRE (24+13-jvmci-b01), built on 2024-08-29T16:25:44Z with gcc 13.2.0

Have you verified this issue still happens when using the latest snapshot? Yes. I only see the issue with snapshots since Puma support isn't yet available in a release.

Describe the issue

While running a Rails application with an GFTC EA snapshot of TruffleRuby with the Panama backend enabled I sometimes see the JVM crash. Unfortunately, since this happens during application boot, I don't have the hs_err log. I'm trying to work with our infrastructure team to preserve this. Currently, when it crashes the deployment is halted and the container used is immediately discarded. I do, however, have a copy of the core dump, but that is too large to attach to the issue.

Code snippet or code repository that reproduces the issue

Puma starting in single mode...
* Puma version: 6.4.2 (truffleruby 24.2.0-dev-37fdafbf - ruby 3.2.4) ("The Eagle of Durango")
*  Min threads: 120
*  Max threads: 120
*  Environment: staging
*          PID: 91
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (systemDictionary.cpp:631), pid=91, tid=111
#  guarantee(THREAD->can_call_java()) failed: can not load classes with compiler thread: class=com/oracle/truffle/nfi/backend/panama/NativePointer, classloader=jdk/internal/loader/ClassLoaders$AppClassLoader
#
# JRE version: Java(TM) SE Runtime Environment Oracle GraalVM 24-dev+13.1 (24.0+13) (build 24+13-jvmci-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 24-dev+13.1 (24+13-jvmci-b01, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xe90c86][thread 110 also had an error]
SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, JavaThread*)+0x776
#
# Core dump will be written. Default location: Core dumps may be processed with "/var/lib/toolbox/crash-reporter/crash-reporter-binary %p %P %s %E" (or dumping to /app/core.91)
#
# An error report file with more information is saved as:
# /app/hs_err_pid91.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp

Steps to reproduce the issue Please include both build steps as well as run steps

Install the latest TruffleRuby GFTC EA build (e.g., with rbenv + ruby-build it would be rbenv install truffleruby+graalvm-dev
Enable the Panama backend export TRUFFLERUBYOPT="--experimental-options --ruby.cexts-panama"
Boot an application using native extensions

Unfortunately, the crash doesn't occur reliably. Sometimes I get an exception instead, which I think is related to a known issue with the propagation of errno

Expected behavior I'd expect the application to behave functionally the same both with and without Panama enabled.

Sep 16 '24 21:09 nirvdrum

Filed internally as GR-58225.

Sep 17 '24 11:09 eregon

@nirvdrum After a quick look it would be really helpful or even necessary to get the hs_err, could you try to get it? @dougxc told me -XX:LogFile (--vm.XX:LogFile=path as a truffleruby arg) can be used to put the hs_err log anywhere.

Sep 17 '24 12:09 eregon

I'm trying to get the hs_err log, but I'm still running into the limitation I mentioned in the issue description. Unfortunately, it doesn't really matter where I write the file as I don't have the means to mount a volume so the file goes away when the container is discarded. We have a crash reporting service based on core_pattern that scans for hs_err and uploads them to a bucket, but it hasn't picked up these files. I'm trying to debug that, but it's a very slow process. I haven't yet been able to reproduce locally.

Sep 17 '24 14:09 nirvdrum

I'm not sure if it works, but maybe you could try -XX:LogFile=/dev/stdout.

Sep 17 '24 14:09 dougxc