graal
graal copied to clipboard
[GR-51307] Unable to collect GC data with NotificationEmitter in native build
A favor I ask of the team
This is actually the second time I have opened this issue, the first was #7803 opened on November 11, 2023, but it was closed maybe because I couldn't explain it clearly, I ask that you please be aware that this issue does not refer to compilation problems and yes, the lack of data in the NotificationEmitter at runtime to collect GC metrics, this occurs in both G1GC and SerialGC in both GraalVM CE and Oracle GraalVM
Describe the problem
I'm developing applications and I'm missing some statistics when using Micrometer, being more specific the details and duration of GC pauses, I started validating the lib code and noticed that the problem is actually that it seems that NotificationEmitter is not working launching the notification events and this only happens in the native image, these metrics are very important for us to be able to put the applications into production, so I would like help to solve this problem.
Steps to reproduce the issue
- git clone --depth 1 https://github.com/viniciusxyz/graalvm-issue-7803-notification-emmiter.git
- Build app
mvn clean package -Pnative
- Run app
./target/main-notification-emitter
When running the native image, only the notifier addition log will be displayed, and when running with hotspot, when System.gc() is called, a log is displayed from the emission of the NotificationEmitter event
Describe GraalVM and its environment:
- GraalVM version: GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
- JDK major version: 21
- OS: Windows 11
- Architecture: AMD64
More details
Print execution with hotspot:
Print from execution with native compilation:
All compilation configuration is in the pom file of the project passed in the example
To prove that the problem does not depend on whether it is graalvm ce or oracle graalvm, the print follows with the same behavior:
Currently, as far as I know, there are two main ways to expose information about the garbage collector runtimes while the application is running so that we can view it continuously when we are in Kubernetes, the first is through a javaagent that exports this information to any provider such as Prometheus and another is by adding some lib that sends these metrics to one of these providers, but as far as I've seen both forms depend on notificationEmmiter for updates related to these metrics, without this improvement several applications that monitor the GC via Prometheus + micrometer for example will be left without the data for monitoring related to GC times.
Demo of information visualization in grafana
Without native compilation
With native compilation
@wirthi @kassifar I reopened the issue, if any details were not clear please let me know
Hi @viniciusxyz
as already mentioned in the previous ticket,
there are some open issues around the management interfaces where we still need some work.
We are aware of that, but this is currently no priority for us to fix.
Maybe this is something @roberttoyonaga likes to look into?
Maybe this is something @roberttoyonaga likes to look into?
Hi @fniephaus. I'm happy to look into this eventually, if nobody else picks this up. However, it probably wont be on my to-do list for some time.
I'm happy to look into this eventually, if nobody else picks this up
Cool, thanks!
However, it probably wont be on my to-do list for some time.
Typo? 😆 We're not going to complain if it's done asap
Typo? 😆 We're not going to complain if it's done asap
oops 😆 I mean it will probably remain on my to-do list for a while
Hi @viniciusxyz
as already mentioned in the previous ticket,
there are some open issues around the management interfaces where we still need some work.
We are aware of that, but this is currently no priority for us to fix.
I understand this point perfectly, I just found it strange to mark it as complete, even though it is a backlog item I believe it should be maintained because problems with GC happen quite frequently and losing traceability on this is a current negative point of native compilation, but I completely understand not be an immediate priority
🙋 we're also interested in having that added.
In the meantime, did anybody already reach out to Micrometer?
It looks like this can be detected: the current GraalVM NotificationEmitter
does not support any notification types, especially not the one Micrometer filters for. They could detect and log that as they already do if ManagementFactory.getMemoryPoolMXBeans()
returns an empty list.
🙋 we're also interested in having that added.
In the meantime, did anybody already reach out to Micrometer? It looks like this can be detected: the current GraalVM
NotificationEmitter
does not support any notification types, especially not the one Micrometer filters for. They could detect and log that as they already do ifManagementFactory.getMemoryPoolMXBeans()
returns an empty list.
Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there
Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there
Well, it will not really help with the issue that there will be no metrics. But you'd at least get a log message why. I was specifically referring to this code in JvmGcMetrics.
Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there
Well, it will not really help with the issue that there will be no metrics. But you'd at least get a log message why. I was specifically referring to this code in JvmGcMetrics.
I understood. I thought there was some fallback, but if there really isn't, then calling the micrometer team probably won't help much since the problem is the lack of notifications in the native compilation.
It's quite frustrating to have to make monitoring worse to take advantage of the benefits of native images, but this is a necessary choice at the moment as far as I understand.