jdk move nmethods relocInfo data to C heap

Benchmark              Mode  Cnt    Score   Error  Units
JmhDotty.runOperation    ss   13  799.173 ± 8.989  ms/op

Statistics for 366 native nmethods:
 N. total size  = 214832
 N. relocation  = 6696
 N. main code   = 99604
 N. oops        = 352
 N. metadata    = 16
Statistics for 21054 bytecoded nmethods for C1:
 total size      = 111065192 (100%)
 in CodeCache    = 72805904 (65.552406%)
   header        = 5221392 (7.171660%)
   relocation    = 6663072 (9.151829%)
   constants     = 576 (0.000791%)
   main code     = 58382064 (80.188637%)
   stub code     = 5310688 (7.294310%)
   oops          = 513712 (0.705591%)
   metadata      = 3377472 (4.639009%)
 immutable data  = 31596216 (28.448351%)
   dependencies  = 630760 (1.996315%)
   nul chk table = 790424 (2.501641%)
   handler table = 181552 (0.574600%)
   scopes pcs    = 16425728 (51.986378%)
   scopes data   = 13567752 (42.941067%)
 mutable data    = 6663072 (5.999244%)
Statistics for 8168 bytecoded nmethods for C2:
 total size      = 63557016 (100%)
 in CodeCache    = 27389224 (43.093945%)
   header        = 2025664 (7.395843%)
   relocation    = 3363144 (12.279078%)
   constants     = 704 (0.002570%)
   main code     = 19860704 (72.512840%)
   stub code     = 2490304 (9.092277%)
   oops          = 293760 (1.072539%)
   metadata      = 2718088 (9.923932%)
 immutable data  = 32804648 (51.614521%)
   dependencies  = 872000 (2.658160%)
   nul chk table = 519232 (1.582800%)
   handler table = 1593032 (4.856117%)
   scopes pcs    = 16690832 (50.879471%)
   scopes data   = 13129552 (40.023449%)
 mutable data    = 3363144 (5.291539%)

Progress

[ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[ ] Commit message must refer to an issue

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276
$ git checkout pull/21276

Update a local copy of the PR:
$ git checkout pull/21276
$ git pull https://git.openjdk.org/jdk.git pull/21276/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21276

View PR using the GUI difftool:
$ git pr show -t 21276

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21276.diff

Oct 01 '24 02:10 bulasevich

:wave: Welcome back bulasevich! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Oct 01 '24 02:10 bridgekeeper[bot]

@bulasevich This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8343789: Move mutable nmethod data out of CodeCache

Reviewed-by: kvn, dlong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 41 new commits pushed to the master branch:

ffa63409884e9a2d41f5223ab5962980edbb008c: 8351567: Jar Manifest test ValueUtf8Coding produces misleading diagnostic output
8d8bd0c4b3bfdc7670bba03b01b0a00dac9f9825: 8349492: Update sun/security/pkcs12/KeytoolOpensslInteropTest.java to use a recent Openssl version
73465b9866da8e97e557d6ad80a95568ada3ee01: 8160327: Support for thumbnails present in APP1 marker for JPEG
dbdbbd473061d7e8077ed07ccc6b03065a8c2ffc: 8348597: Update HarfBuzz to 10.4.0
7999091e3e976fe62d859d508bf649b6ec7bc94e: 8351555: Help section added in JDK-8350638 uses invalid HTML
8450ae902ee012b6447015b24369eee85c091ec5: 8351440: Link with -reproducible on macOS
b40be22512a8d3b3350fef8d6668d80134a6f1a6: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges
6b84bdef3b203e62cebd77705ef5b3e081302c28: 8350007: Add usage message to the javadoc executable
32f2c2d80894552b8c5329cfa51c7e836314901f: 8351017: ChronoUnit.MONTHS.between() not giving correct result when date is in February
d90b79a2bd2f8bb6e50aa32aafe47748ef6ebeff: 8351046: Rename ObjectMonitor functions
... and 31 more: https://git.openjdk.org/jdk/compare/cfab88b1a2351a187bc1be153be96ca983a7776c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Oct 01 '24 02:10 openjdk[bot]

@bulasevich The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Oct 01 '24 02:10 openjdk[bot]

@vnkozlov Hi Vladimir, What do you think about the idea of moving relocInfo data out of nmethod additionally to recent Move immutable nmethod data from CodeCache? It would reduce the CodeHeap fill by 5%.

Oct 03 '24 15:10 bulasevich

Note, with https://bugs.openjdk.org/browse/JDK-8334691 and other changes I moving into direction to make relocation info data immutable. It is already "immutable" in mainline JDK after https://bugs.openjdk.org/browse/JDK-8333819. But it is still mutable in Leyden because we have to patch indexes during publishing nmethod.

My idea was to move relocation info data (which has big size) into immutable data section of nmethod. And leave mutable _oops and _metadata together with code since they are relatively small and we need to patch them together with code.

Nov 07 '24 23:11 vnkozlov

Mutable sizes % do not add up:

mutable data    = 6071648 (9.396180%)
   relocation    = 3437176 (12.846409%)
   oops          = 239488 (0.895084%)
   metadata      = 2394984 (8.951227%)

Nov 07 '24 23:11 vnkozlov

Main question is: why you did it only for nmethod?

Yes. I did it symmetrically to a separate immutable data storage for nmethod. Now I see I do not like implementation where for some blobs relocation info is local, but for other it stays aside. I am going to rework that.

any performance effects with this change?

On the aarch machine I see a slight improvement on the big benchmark caused by the code sparsity improvement. Though I need to do more benchmarking to make sure I am not making things worse for others.

Benchmark              Mode  Cnt    Score   Error  Units  |  Benchmark              Mode  Cnt    Score   Error  Units
JmhDotty.runOperation    ss  999  861.717 ± 1.543  ms/op  |  JmhDotty.runOperation    ss  999  840.959 ± 1.473  ms/op
                                                          |
       34555411781      cache-misses:u                    |         34343012187      cache-misses:u
     2913869717708      cpu-cycles:u                      |       2863838151745      cpu-cycles:u
     4185324759051      instructions:u                    |       4209616523046      instructions:u            
     1460914744576      L1-icache-loads:u                 |       1452066316397      L1-icache-loads:u
       97806845375      L1-icache-load-misses:u           |         93815390496      L1-icache-load-misses:u   
     1191854820746      iTLB-loads:u                      |       1169231847276      iTLB-loads:u
       10591067761      iTLB-load-misses:u                |         10134696419      iTLB-load-misses:u        
      838964735227      branch-loads:u                    |        838353168582      branch-loads:u
       25829615231      branch-load-misses:u              |         24361474411      branch-load-misses:u
      836291984964      br_pred:u                         |        838153583659      br_pred:u
       25733552818      br_mis_pred:u                     |         24353396612      br_mis_pred:u
         562168308      group0-code_sparsity:u            |           449848707      group0-code_sparsity:u

Nov 09 '24 11:11 bulasevich

Mutable sizes % do not add up:

Thanks. The correct sizes:

Statistics for 21032 bytecoded nmethods for C1:
 mutable data    = 10488856 (8.688409%)
   relocation    = 6573064 (62.667118%)
   oops          = 515680 (4.916456%)
   metadata      = 3400112 (32.416424%)
Statistics for 8171 bytecoded nmethods for C2:
 mutable data    = 6572064 (10.118859%)
   relocation    = 3406216 (51.828709%)
   oops          = 305912 (4.654733%)
   metadata      = 2859936 (43.516556%)

Nov 09 '24 11:11 bulasevich

My idea was to move relocation info data (which has big size) into immutable data section of nmethod. And leave mutable _oops and _metadata together with code since they are relatively small and we need to patch them together with code.

Hmm. If relocation info goes to an immutable blob, oops+metadata hardly deserves a separate blob.

Nov 09 '24 11:11 bulasevich

Performance update. On an aarch machine the CodeCacheStress benchmark shows a 1-2% performance improvement with this change,

Statistics on the CodeCacheStress benchmark with high numberOfClasses-instanceCount-rangeOfClasses parameter values:

nmethod_count:134000, total_compilation_time: 510460ms
total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms,
total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB

Nov 14 '24 13:11 bulasevich

-XX:+PrintNMethodStatistics

Statistics for 21032 bytecoded nmethods for C1:
 total size      = 120722408 (100%)
 in CodeCache    = 79358808 (65.736603%)
   header        = 5215936 (6.572599%)
   constants     = 320 (0.000403%)
   main code     = 69017912 (86.969444%)
   stub code     = 5124640 (6.457557%)
 mutable data    = 10488856 (8.688409%)
   relocation    = 6573064 (62.667118%)
   oops          = 515680 (4.916456%)
   metadata      = 3400112 (32.416424%)
 immutable data  = 30874744 (25.574991%)
   dependencies  = 636240 (2.060714%)
   nul chk table = 756920 (2.451583%)
   handler table = 180456 (0.584478%)
   scopes pcs    = 16052608 (51.992683%)
   scopes data   = 13248520 (42.910542%)
Statistics for 8171 bytecoded nmethods for C2:
 total size      = 64948664 (100%)
 in CodeCache    = 25580504 (39.385727%)
   header        = 2026408 (7.921689%)
   constants     = 448 (0.001751%)
   main code     = 20925472 (81.802422%)
   stub code     = 2628176 (10.274137%)
 mutable data    = 6572064 (10.118859%)
   relocation    = 3406216 (51.828709%)
   oops          = 305912 (4.654733%)
   metadata      = 2859936 (43.516556%)
 immutable data  = 32796096 (50.495411%)
   dependencies  = 926992 (2.826532%)
   nul chk table = 537024 (1.637463%)
   handler table = 1695568 (5.170030%)
   scopes pcs    = 15451968 (47.115265%)
   scopes data   = 14184544 (43.250710%)

Nov 21 '24 13:11 bulasevich

Webrevs

Nov 21 '24 14:11 mlbridge[bot]

It would be nice to make relocations immutable, but one roadblock is the use of relocInfo::change_reloc_info_for_address() by C1 patching. We would need to separate mutable and immutable relocations, or replace C1 patching with deoptimization, like on DEOPTIMIZE_WHEN_PATCHING aarch64.

Nov 22 '24 00:11 dean-long

@bulasevich Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

Dec 11 '24 13:12 openjdk[bot]

/reviewers 2

Jan 09 '25 01:01 dean-long

@dean-long The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

Jan 09 '25 01:01 openjdk[bot]

I just noticed the old code would have replaced the oop_Relocation with a internal_word_Relocation for the NearCpool==false case. How did that ever work correctly?

Thanks for review!

NearCpool==false was not functioning correctly. I had considered a separate fix for it, but with the current change, the incorrect code is eliminated.

$ ~/jdk-23/bin/java -XX:-NearCpool -XX:+UseShenandoahGC -version
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000ffff6f94c190, pid=95263, tid=95264
#
# JRE version: OpenJDK Runtime Environment (23.0+38) (build 23+38)
# Java VM: OpenJDK 64-Bit Server VM (23+38, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-aarch64)
# Problematic frame:
# j  java.lang.module.ModuleDescriptor.<init>(Ljava/lang/String;Ljava/lang/module/ModuleDescriptor$Version;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/lang/String;IZ)V+29 java.base
#
# Core dump will be written.

Jan 10 '25 11:01 bulasevich

I see this failure in our testing with an internal test:

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/workspace/open/src/hotspot/share/code/nmethod.cpp:1513), pid=1863674, tid=1863692
#  assert(static_cast<int>(_metadata_offset) == reloc_size + oop_size) failed: failed: 21744 != 87280

Current CompileTask:
C2:61370 3693    b  4       org.apache.lucene.analysis.en.KStemData1::<clinit> (27623 bytes)

Stack: [0x00007f3acf3f5000,0x00007f3acf4f5000],  sp=0x00007f3acf4f0a50,  free space=1006k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x14d9151]  nmethod::nmethod(Method*, CompilerType, int, int, int, int, int, unsigned char*, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, CompLevel, char*, int, JVMCINMethodData*)+0x981  (nmethod.cpp:1513)
V  [libjvm.so+0x14d94df]  nmethod::new_nmethod(methodHandle const&, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, CompLevel, char*, int, JVMCINMethodData*)+0x22f  (nmethod.cpp:1200)
V  [libjvm.so+0x926cb4]  ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, bool, bool, bool, bool, int)+0x4c4  (ciEnv.cpp:1063)
V  [libjvm.so+0x157b0d5]  PhaseOutput::install_code(ciMethod*, int, AbstractCompiler*, bool, bool)+0x125  (output.cpp:3443)
V  [libjvm.so+0xa54d02]  Compile::Code_Gen()+0x612  (compile.cpp:3033)
V  [libjvm.so+0xa579cf]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1c6f  (compile.cpp:885)
V  [libjvm.so+0x89f495]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5  (c2compiler.cpp:142)
V  [libjvm.so+0xa63a88]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928  (compileBroker.cpp:2319)
V  [libjvm.so+0xa647c8]  CompileBroker::compiler_thread_loop()+0x528  (compileBroker.cpp:1977)
V  [libjvm.so+0xf3478e]  JavaThread::thread_main_inner()+0xee  (javaThread.cpp:777)
V  [libjvm.so+0x1880536]  Thread::call_run()+0xb6  (thread.cpp:232)
V  [libjvm.so+0x155a188]  thread_native_entry(Thread*)+0x128  (os_linux.cpp:860)

Jan 14 '25 09:01 TobiHartmann

Looks like this also triggers with compiler/escapeAnalysis/TestFindInstMemRecursion.java on Linux x64 and -XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing. This change needs more testing before integration.

Jan 14 '25 09:01 TobiHartmann

@bulasevich Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

Jan 15 '25 10:01 openjdk[bot]

@TobiHartmann

I see this failure in our testing with an internal test

Thanks for catching that issue! I've addressed it.

This change needs more testing before integration.

Thank you for pointing that out. Here is my test plan for additional validation. Please let me know if you have any suggestions or if adjustments are needed:

Functional Testing:

Platforms: x86 and AArch64
Build Configurations: Release, Fastdebug, and Slowdebug
Java Options: Default and -XX:+UseShenandoahGC
Tests: jtreg tests covering Hotspot and JDK, tiers 1–3

Performance Testing:

Benchmarks: DaCapo and Renaissance
Modes: Default and -XX:+UseShenandoahGC

Jan 28 '25 12:01 bulasevich

Sounds good. I'll run some testing as well and will report back once it passed.

Jan 31 '25 09:01 TobiHartmann

@bulasevich, please wait my review before integration.

Jan 31 '25 19:01 vnkozlov

replace C1 patching with deoptimization, like on DEOPTIMIZE_WHEN_PATCHING aarch64.

It might be worth looking at that. My experiments at the time the AArch64 port was done indicated that only 10% of deoptimization on AArch64 was caused by patching events.

It is possible to generate C1 code for AArch64 that is patchable, but the frequent additional indirections in generated code are worse for performance than the occasional deoptimization.

Feb 02 '25 17:02 theRealAph

I see more failures in testing:

compiler/jsr292/CallSiteDepContextTest.java fails on Windows with -XX:+UseZGC:

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (workspace\open\src\hotspot\share\code\dependencies.cpp:2038), pid=109676, tid=68072
#  assert(call_site->is_a(vmClasses::CallSite_klass())) failed: sanity

Current thread (0x000002b4343ac600):  JavaThread "MainThread"        [_thread_in_vm, id=68072, stack(0x000000f121300000,0x000000f121400000) (1024K)]

Stack: [0x000000f121300000,0x000000f121400000],  sp=0x000000f1213fe9f0,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x601c63]  Dependencies::check_call_site_target_value+0x3e3  (dependencies.cpp:2038)
V  [jvm.dll+0x60182f]  Dependencies::DepStream::check_call_site_dependency+0x6f  (dependencies.cpp:2145)
V  [jvm.dll+0xd0898d]  nmethod::check_dependency_on+0x7d  (nmethod.cpp:2853)
V  [jvm.dll+0x609a19]  DependencyContext::mark_dependent_nmethods+0xb9  (dependencyContext.cpp:74)
V  [jvm.dll+0xcd2ff2]  MethodHandles::mark_dependent_nmethods+0xa2  (methodHandles.cpp:953)
V  [jvm.dll+0xccbf2e]  MHN_setCallSiteTargetNormal+0x1ee  (methodHandles.cpp:1211)
C  0x000002b02377ebac  (no source info available)

Same test also fails with:

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/workspace/open/src/hotspot/share/oops/compressedKlass.inline.hpp:88), pid=95938, tid=35843
#  assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710)

---------------  T H R E A D  ---------------

Current thread (0x0000000142865e10):  JavaThread "MainThread"        [_thread_in_vm, id=35843, stack(0x0000000170c4c000,0x0000000170e4f000) (2060K)]

Stack: [0x0000000170c4c000,0x0000000170e4f000],  sp=0x0000000170e4dd00,  free space=2055k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x1161294]  VMError::report(outputStream*, bool)+0x1aac  (compressedKlass.inline.hpp:88)
V  [libjvm.dylib+0x1164880]  VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x548
V  [libjvm.dylib+0x576b0c]  print_error_for_unit_test(char const*, char const*, char*)+0x0
V  [libjvm.dylib+0x70a898]  CompressedKlassPointers::check_encodable(void const*)+0x0
V  [libjvm.dylib+0x585d24]  oopDesc::klass() const+0x104
V  [libjvm.dylib+0x5afbec]  Dependencies::check_call_site_target_value(oop, oop, CallSiteDepChange*)+0xcc
V  [libjvm.dylib+0x5b09e4]  Dependencies::DepStream::check_call_site_dependency(CallSiteDepChange*)+0x84
V  [libjvm.dylib+0xde663c]  nmethod::check_dependency_on(DepChange&)+0x50
V  [libjvm.dylib+0x5b291c]  DependencyContext::mark_dependent_nmethods(DeoptimizationScope*, DepChange&)+0xa4
V  [libjvm.dylib+0xda4dfc]  MethodHandles::mark_dependent_nmethods(DeoptimizationScope*, Handle, Handle)+0xec
V  [libjvm.dylib+0xda7ea4]  MHN_setCallSiteTargetNormal+0x304
j  java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal(Ljava/lang/invoke/CallSite;Ljava/lang/invoke/MethodHandle;)V+0 java.base@25-internal
j  java.lang.invoke.CallSite.setTargetNormal(Ljava/lang/invoke/MethodHandle;)V+7 java.base@25-internal
j  java.lang.invoke.MutableCallSite.setTarget(Ljava/lang/invoke/MethodHandle;)V+2 java.base@25-internal
j  compiler.jsr292.CallSiteDepContextTest.testGC(ZZ)V+267
j  compiler.jsr292.CallSiteDepContextTest.main([Ljava/lang/String;)V+11
j  java.lang.invoke.LambdaForm$DMH+0x00000c0000001c00.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)V+10 java.base@25-internal
j  java.lang.invoke.LambdaForm$MH+0x00000c0000003400.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+33 java.base@25-internal
j  java.lang.invoke.Invokers$Holder.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+20 java.base@25-internal
j  jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+55 java.base@25-internal
j  jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+23 java.base@25-internal
j  java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+102 java.base@25-internal
j  com.sun.javatest.regtest.agent.MainWrapper$MainTask.run()V+134
j  java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base@25-internal
j  java.lang.Thread.run()V+19 java.base@25-internal
v  ~StubRoutines::call_stub 0x00000001160d0180
V  [libjvm.dylib+0x88beec]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x448
V  [libjvm.dylib+0x88ac08]  JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0x1c4
V  [libjvm.dylib+0x88ada4]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x6c
V  [libjvm.dylib+0xa1289c]  thread_entry(JavaThread*, JavaThread*)+0x12c
V  [libjvm.dylib+0x8c2088]  JavaThread::thread_main_inner()+0x1a8
V  [libjvm.dylib+0x10ae844]  Thread::call_run()+0xf4
V  [libjvm.dylib+0xe467f0]  thread_native_entry(Thread*)+0x138
C  [libsystem_pthread.dylib+0x6f94]  _pthread_start+0x88
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal(Ljava/lang/invoke/CallSite;Ljava/lang/invoke/MethodHandle;)V+0 java.base@25-internal
j  java.lang.invoke.CallSite.setTargetNormal(Ljava/lang/invoke/MethodHandle;)V+7 java.base@25-internal
j  java.lang.invoke.MutableCallSite.setTarget(Ljava/lang/invoke/MethodHandle;)V+2 java.base@25-internal
j  compiler.jsr292.CallSiteDepContextTest.testGC(ZZ)V+267
j  compiler.jsr292.CallSiteDepContextTest.main([Ljava/lang/String;)V+11
j  java.lang.invoke.LambdaForm$DMH+0x00000c0000001c00.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;)V+10 java.base@25-internal
j  java.lang.invoke.LambdaForm$MH+0x00000c0000003400.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+33 java.base@25-internal
j  java.lang.invoke.Invokers$Holder.invokeExact_MT(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+20 java.base@25-internal
j  jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+55 java.base@25-internal
j  jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+23 java.base@25-internal
j  java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+102 java.base@25-internal
j  com.sun.javatest.regtest.agent.MainWrapper$MainTask.run()V+134
j  java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base@25-internal
j  java.lang.Thread.run()V+19 java.base@25-internal
v  ~StubRoutines::call_stub 0x00000001160d0180
Lock stack of current Java thread (top to bottom):

Feb 03 '25 09:02 TobiHartmann

@stefank just pointed out that this could be a regression from JDK-8347564 that just went in.

Feb 03 '25 09:02 TobiHartmann

@stefank just pointed out that this could be a regression from JDK-8347564 that just went in.

I thought that it could have been a regression but after looking closer I don't think it is. However, I do see a problem that the code removes the GC barriers when loading oops. I'll add an inline comment.

Feb 03 '25 12:02 stefank

When using Shenadoah I am still seeing embedded compressed OOPs in the instruction stream. Is that correct? These form two instructions, like so:

  movz(dst, 0xDEAD, 16);
  movk(dst, 0xBEEF);

Do you want compressed OOPs to be moved out of CodeCache as well as uncompressed OOPs? If so, you should change loadConNNodein C2.

Feb 03 '25 15:02 theRealAph

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 2/7/25 18:33, Boris Ulasevich wrote:

I think ADP+MOVK is better both in terms of performance and code density.

Good work, you may be right.

Neoverse N1 has a fairly narrow (4 wide) decoder, so I guess it's more likely to be limited by instruction count.

That benchmark isn't valid for GCC on my machine, because its outputs aren't used so GCC doesn't generate code for the asm.

However, if we change the benchmark to actually *do something* with the data (simply add the results together) we get this for movz+movk on Apple M1:

2,332,615,983 cycles:u # 3.135 GHz (95.26%) 18,660,205,348 instructions:u # 8.00 insn per cycle (95.26%)

and this for adrp+movk:

2,563,872,489 cycles:u # 3.057 GHz (96.03%) 14,357,197,644 instructions:u # 5.60 insn per cycle (96.03%)

Here we can see that the M1 is totally front-end limited: 8 ipc is the speed of light on an M1. Nonetheless, the timings are similar, with the win going to movz+movk.

On Neoverse V2, I also see an advantage for adrp+movk:

4162362189 cycles:u # 2.796 GHz 25002398111 instructions:u # 6.01 insn per cycle

3243420864 cycles:u # 2.796 GHz 21002398115 instructions:u # 6.48 insn per cycle

So, looks like adrp+movk has an overall advantage. I'm still somewhat skeptical that this usage really deserves a reloc handler of its own, though, given the usage.

If we do decide to do this, please give the forced movk version of adrp() a new name, and have adrp() call it.

Having said all of that, I'm not sure why we're seeing such different instruction counts for Apple M1 and Neoverse V2, I guess it must be the compiler but I don't know why, so take all of this with a big pinch of salt. For this really to be valid I guess we'd have to use the exact same binaries.

-- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Feb 08 '25 10:02 mlbridge[bot]

Mailing list message from Andrew Haley on hotspot-compiler-dev:

On 2/8/25 10:38, Andrew Haley wrote:

On 2/7/25 18:33, Boris Ulasevich wrote:

I think ADP+MOVK is better both in terms of performance and code density.

Good work, you may be right.

Neoverse N1 has a fairly narrow (4 wide) decoder, so I guess it's more likely to be limited by instruction count.

That benchmark isn't valid for GCC on my machine, because its outputs aren't used so GCC doesn't generate code for the asm.

However, if we change the benchmark to actually *do something* with the data (simply add the results together) we get this for movz+movk on Apple M1:

Sorry, I messed up:

2,916,015,951 cycles:u # 3.151 GHz (95.33%) 22,773,874,629 instructions:u # 7.81 insn per cycle (95.33%)

adrp:

3,132,136,323 cycles:u # 3.114 GHz (96.60%) 18,420,464,246 instructions:u # 5.88 insn per cycle (96.60%)

-- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Feb 08 '25 11:02 mlbridge[bot]

jdk jdk copied to clipboard

move nmethods relocInfo data to C heap

Progress

Reviewing

Webrevs

jdk
jdk copied to clipboard