leyden icon indicating copy to clipboard operation
leyden copied to clipboard

8368465: [leyden] Improve precompiler method selection code

Open shipilev opened this issue 2 months ago • 9 comments

Forked from JDK-8366681: there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.

Notable improvements:

  1. Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels.
  2. Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.

Additional testing:

  • [x] Performance tests (see comments)
  • [x] Linux x86_64 server fastdebug, runtime/cds

Progress

  • [x] Change must not contain extraneous whitespace
  • [ ] Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

  • JDK-8368465: [leyden] Improve precompiler method selection code (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99
$ git checkout pull/99

Update a local copy of the PR:
$ git checkout pull/99
$ git pull https://git.openjdk.org/leyden.git pull/99/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 99

View PR using the GUI difftool:
$ git pr show -t 99

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/leyden/pull/99.diff

Using Webrev

Link to Webrev Comment

shipilev avatar Sep 23 '25 12:09 shipilev

javac test (1000 iterations trained, 50 iterations production)

# --- Before

Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:AOTCache=app.aot -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50
  Time (mean ± σ):     338.2 ms ±   3.5 ms    [User: 742.4 ms, System: 120.6 ms]
  Range (min … max):   332.3 ms … 342.9 ms    10 runs
 
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:+UnlockExperimentalVMOptions -XX:+PreloadOnly -XX:AOTCache=app.aot JavacBenchApp 50
  Time (mean ± σ):     497.2 ms ±   4.1 ms    [User: 491.6 ms, System: 55.5 ms]
  Range (min … max):   489.7 ms … 502.3 ms    10 runs
 

# --- After

Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:AOTCache=app.aot -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50
  Time (mean ± σ):     322.8 ms ±   2.2 ms    [User: 511.0 ms, System: 101.3 ms]
  Range (min … max):   319.1 ms … 325.9 ms    10 runs
 
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:+UnlockExperimentalVMOptions -XX:+PreloadOnly -XX:AOTCache=app.aot JavacBenchApp 50
  Time (mean ± σ):     483.0 ms ±   4.5 ms    [User: 476.9 ms, System: 55.4 ms]
  Range (min … max):   476.5 ms … 492.0 ms    10 runs

user time significantly improves, I think that's because we manage to preload the hottest code before C1 discovers it needs to compile it. See how loading is half of the workload time, and "before" C1 was more active:

Before: plot-javac50-before

After: plot-javac50-after

shipilev avatar Sep 23 '25 12:09 shipilev

:wave: Welcome back shade! A progress list of the required criteria for merging this PR into premain will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Sep 23 '25 12:09 bridgekeeper[bot]

Larger benchmarks all improve with 1-core tests.

quarkus-getting-started:

Run,Old CDS + AOT,New CDS + AOT
1,235,190
2,234,198
3,224,189
4,227,185
5,224,199
6,226,192
7,234,193
8,219,181
9,218,196
10,214,197
Geomean,225.40,191.92 (1.17x improvement)
Stdev,6.87,5.57

helidon-quickstart-se

Run,Old CDS + AOT,New CDS + AOT
1,196,167
2,191,166
3,195,166
4,196,173
5,199,165
6,203,169
7,199,168
8,200,162
9,198,167
10,199,173
Geomean,197.58,167.57 (1.18x improvement)
Stdev,3.10,3.23

micronaut-first-app

Run,Old CDS + AOT,New CDS + AOT
1,239,239
2,247,246
3,258,237
4,262,229
5,250,234
6,233,233
7,249,236
8,256,239
9,250,233
10,259,241
Geomean,250.15,236.66 (1.06x improvement)

spring-boot-getting-started:

Run,Old CDS + AOT,New CDS + AOT
1,494,450
2,489,440
3,492,441
4,494,426
5,480,440
6,501,441
7,521,441
8,483,443
9,486,435
10,490,441
Geomean,492.88,439.76 (1.12x improvement)
Stdev,10.93,5.78

spring-petclinic:

Run,Old CDS + AOT,New CDS + AOT
1,2691,2407
2,2659,2451
3,2559,2439
4,2645,2433
5,2667,2475
6,2650,2447
7,2647,2454
8,2667,2439
9,2635,2440
10,2649,2446
Geomean,2646.69,2443.05 (1.08x improvement)
Stdev,32.84,16.34

shipilev avatar Sep 23 '25 12:09 shipilev

❗ This change is not yet ready to be integrated. See the Progress checklist in the description for automated requirements.

openjdk[bot] avatar Sep 23 '25 12:09 openjdk[bot]

Webrevs

mlbridge[bot] avatar Sep 23 '25 12:09 mlbridge[bot]

Ready for review, folks. There are performance benefits of doing this, very apparently.

shipilev avatar Oct 17 '25 07:10 shipilev

I think we have small performance "issue" how we replace existing JITed code with new one which AOT code loading could be more sensitive. We deoptimize old code before new code is set under lock NMethodState_lock : https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/ci/ciEnv.cpp#L1058

If lock is held by other thread we may deoptimize previous code and go into interpreter before new code is set for use.

This is present in mainline but with normal JIT compilation replacement it may be not noticeable.

vnkozlov avatar Oct 18 '25 00:10 vnkozlov

An other suggestion for this concurrent preloading would be to split A4 preload code. One set is the current which needs to wait compute_java_loaders(). And new one (much smaller) is for simple methods for classes which are loaded first (String, for example) which we can preload much sooner.

vnkozlov avatar Oct 18 '25 00:10 vnkozlov

@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Nov 15 '25 03:11 bridgekeeper[bot]