JFR test crosses RSS set threshold with Mandrel 25.0.0.1
JFRPerfTest crosses the RSS memory threshold. The number might seem trivial at first sight, but it's consistent over many runs, CI and local, RHEL 8-like and RHEL 9-like Linux systems, amd64. Mandrel 21 or 24 does not cross that threshold, i.e. JFR overhead is smaller there. Th increase might be warranted by more work being done though.
Mandrel Integration Testsuite
With Mandrel 25.0.0.1:
$ mvn clean verify
-Ptestsuite -DincludeTags=reproducers,perfcheck,runtimes
-Dtest=JFRTest#jfrPerfTest
-Dquarkus.version=3.28.1
-Dquarkus.native.container-runtime=podman -Drootless.container-runtime=true
-Dpodman.with.sudo=false 2>&1 | tee log.log
Quarkus 3.28.1
[ERROR] JFRTest.jfrPerfTest:185->jfrPerfTestRun:229->startComparisonForBenchmark:342
Application JFR_PERFORMANCE in mode diff_native consumed 40 kB of RSS memory more ,
which is over 35 kB threshold by 14%.
Quarkus 3.20.3
[ERROR] JFRTest.jfrPerfTest:185->jfrPerfTestRun:229->startComparisonForBenchmark:342
Application JFR_PERFORMANCE in mode diff_native consumed 38 kB of RSS memory more ,
which is over 35 kB threshold by 9%.
Quarkus 3.15.7
[ERROR] JFRTest.jfrPerfTest:185->jfrPerfTestRun:229->startComparisonForBenchmark:342
Application JFR_PERFORMANCE in mode diff_native consumed 39 kB of RSS memory more ,
which is over 35 kB threshold by 11%.
Regardless the iterations, one does not hit this with an older, i.e. 24 major or 21 major Mandrel.
@roberttoyonaga FYI, perhaps it could be explained by an increased coverage, added instrumentation? Not sure atm. Worth a note though as it's a consistent result, not a flaky test.
It would be worth noting the new JFR features in Mandrel 25 vs Mandrel 24 and below.
There haven't been any major JFR features added between Mandrel 24 and 25. At the SubstrateVM-level there have been bug fixes and minor internal improvements, but nothing that should cause a significant increase in RSS. There have been 3 major JFR features added to open JDK in 25, but they are all implemented in Hotspot (not much in jdk.jfr). We do not support them yet so they won't have an effect in Native Image. I'll dig deeper and try to figure out what could be causing this.
@Karm Has the overall RSS consumption changed between 24 and 25?
Side note: It's very strange that it's consistently 38-40 % larger. I think there's usually more variation.
~~@Karm do you know if Quarkus is running with more threads than before? I searched the repository for quarkus.vertx.event-loops-pool-size but could not find anything. If more threads are running, then there will be additional JFR 2 buffers for each platform thread (500kB each by default).~~
Update 1: I dumped JFR snapshots and compared the thread count at startup between 24 and 25. They are both 21. So no difference there.
Running the jfrPerfTest on my computer I get this for RSS:
Mandrel 24: (app with JFR) 54919 vs (no JFR) 41537 kB
Mandrel 25: (app with JFR) 54232 vs (no JFR) 39084 kB
The thresholds have actually been misnamed, they are % differences, not absolute differences in kB (PR fixing this). This makes the JFR tests different than the other tests that compare against thresholds.
So this means that since the RSS without JFR decreased in Mandrel 25, the threshold is crossed (because the relative contribution of JFR to RSS is higher). Even though the RSS with JFR remained the same. It seems like whatever optimization reduced the RSS in Mandrel 25 is does not happen when JFR is included in the build. I'll try and figure out what the change was and why.
Update 2:
I ran a Quarkus getting-started quickstart with native memory tracking enabled. Native memory sizes shown below are sampled at startup and shutdown. It looks like JFR is consuming the same amount of native memory in Mandrel 24 and 25. I also noticed that the Java heap committed size at start up is ~2MB less in Mandrel 25. This matches the decrease in RSS reported by jfrPerfTest logs.
In Mandrel 24 with Quarkus 3.18.4
In Mandrel 25 with Quarkus 3.18.4
It's still unclear why executables built with JFR do not experience the smaller heap usage in 25.
This issue appears to be stale because it has been open 30 days with no activity. This issue will be closed in 7 days unless Stale label is removed, a new comment is made, or not-Stale label is added.
I haven't forgotten about this. It's still on my todo list. I plan to revisit it next week.
Adding the not-Stale label helps silence the stale bot :)
Just an update: After some more digging, it seems like at least some of the JFR RSS increase is due to new features in JDK 25. Some of those features we support, some we do not. For the features we do not support, I made a PR https://github.com/oracle/graal/pull/12500 to make the feature code unreachable. This reduces code area and image heap size.
I'm also working on a PR to reduce JFR native memory consumption.
I opened another PR to help reduce JFR native memory consumption https://github.com/oracle/graal/pull/12502