jdk 8321509: False positive in get_trampoline fast path causes crash

trafficstars

Progress

[x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue

Issue

JDK-8321509: False positive in get_trampoline fast path causes crash (Bug - P3)

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer)
Tobias Hartmann (@TobiHartmann - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19796/head:pull/19796
$ git checkout pull/19796

Update a local copy of the PR:
$ git checkout pull/19796
$ git pull https://git.openjdk.org/jdk.git pull/19796/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19796

View PR using the GUI difftool:
$ git pr show -t 19796

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19796.diff

Webrev

Link to Webrev Comment

Jun 19 '24 19:06 dean-long

:wave: Welcome back dlong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Jun 19 '24 19:06 bridgekeeper[bot]

@dean-long This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8321509: False positive in get_trampoline fast path causes crash

Reviewed-by: kvn, adinn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 195 new commits pushed to the master branch:

b32e4a68bca588d908bd81a398eb3171a6876dc5: 8335356: Shenandoah: Improve concurrent cleanup locking
62cbf70346e78ca94ce6ea4ba5a308ea0a2bbfa8: 8336085: Fix simple -Wzero-as-null-pointer-constant warnings in CDS code
2928753bd95356467e4fe42ee391e45d1cb6e89c: 8324966: Allow selecting jtreg test case by ID from make
1772a929af0c31bf22153cc19c5d11b00273453b: 8334457: Test javax/swing/JTabbedPane/bug4666224.java fail on macOS with because pressing the ‘C’ key does not switch the layout to WRAP_TAB_LAYOUT
b7d0eff5ad77e338b237773d2fc047eea3d2ac12: 8207908: JMXStatusTest.java fails assertion intermittently
cf940e139a76e5aabd52379b8a87065d82b2284c: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
b363de8c9fbf7d9e4aade41a2e883cc83ced320b: 8335946: DTrace code snippets should be generated when DTrace flags are enabled
d6c6847e32673d36a1958cefd1851ec9f3b1e2ad: 8335743: jhsdb jstack cannot print some information on the waiting thread
cad68e06ecad1e19091d1af9c0f9b8145d6842fb: 8335935: Chained builders not sending transformed models to next transforms
242f1133f8e1b373de3714cefc7f6701c39707fe: 8334481: [JVMCI] add LINK_TO_NATIVE to MethodHandleAccessProvider.IntrinsicMethod
... and 185 more: https://git.openjdk.org/jdk/compare/974dca80df71c5cbe492d1e8ca5cee76bcc79358...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Jun 19 '24 19:06 openjdk[bot]

@dean-long The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Jun 19 '24 19:06 openjdk[bot]

Webrevs

Jun 25 '24 06:06 mlbridge[bot]

AArch64 binds some trampoline call-sites early, thanks to its is_always_within_branch_range() check. This allows a false positive match with a trampoline stub during code buffer expansion in rare situations. To fix this, this PR makes the following changes:

Do not call get_trampoline() in Relocation::pd_call_destination or pd_set_call_destination, as they use the destination cannot be trusted during fixup.
Restrict NativeCall::get_trampoline() to only operate on nmethods, not CodeBuffers (or BufferBlob)
Fixup trampoline stub "owners" (call sites) as late as possible, in new trampoline_stub_Relocation::pd_fix_owner_after_move(), and only if destination is an nmethod.
Avoid calling NativeCall::set_destination_mt_safe() during CodeBuffer fixup, which allows assert_lock to also go away
Detect self-calls in NativeCall::destination() to avoid unnecessary call to find_blob()
Add NativeCall fast paths for pd_call_destination/pd_set_call_destination

Jun 25 '24 06:06 dean-long

Thanks Vladimir.

Jun 26 '24 03:06 dean-long

I am hoping an AArch64 expert can take a look at this. @theRealAph maybe?

Jun 26 '24 03:06 dean-long

This solution looks ok to me as far as jdk mainline is concerned. However, I think there is a problem as far Leyden is concerned.

The code changes Relocation::pd_call_destination to always expect its associated call to be embedded within an nmethod when orig_addr is null (i.e. when it is called with no args as reloc.pd_call_destination()). This is where the problem arises.

Currently, Leyden calls Relocation::pd_call_destination() from CallRelocation::destination() (and also from trampoline_stub_Relocation::destination()) when storing an nmethod to the CDS code cache. It needs to do this in order to be able to track relocs of type virtual_call_type, opt_virtual_call_type, static_call_type and runtime_call_type (also trampoline_stub_type). That is because all these relocs need their call destination to be adjusted when the nmethod is restored from the CDS code cache.

However, we already have prototype code in Leyden to store generated blobs to the CDS code cache. These blobs may legitimately include runtime_call_type relocs which also need tracking and adjusting at restore. For example, shared runtime or compiler stubs may call out to the JVM. Likewise, stubs in a stub generator blob may need to call out to the JVM or to a stub in some earlier generated blob. So, Leyden will need to call CallRelocation::destination() in cases where the associated call is embedded in a non-nmethod. Note that these calls will never employ trampolines.

The obvious fix is to modify Relocation::pd_call_destination so that it drops through to call MacroAssembler::pd_call_destination if the incoming blob is not an nmethod.

Jul 02 '24 11:07 adinn

@adinn is right. I thought that it mostly affect code during codeBlob expansion but it is not. I applied patch to Leyden/premain repo and hit assert when we generate AOT code because we still processing CodeBuffer before nmethod is created:

#  Internal Error (/work/leyden/open/src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp:59), pid=15382, tid=28419
#  assert(cb != nullptr && cb->is_nmethod()) failed: nmethod expected
#
# JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-06-26-1746082.vkozlov...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-06-26-1746082.vkozlov..., mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
...
Current CompileTask:
C1:  F9322 C0 Q0 S6270 2840    b    2       java.util.zip.InflaterInputStream::close (34 bytes)

Stack: [0x000000016cdec000,0x000000016cfef000],  sp=0x000000016cfed1d0,  free space=2052k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.dylib+0x11dd564]  VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x544  (nativeInst_aarch64.cpp:59)
V  [libjvm.dylib+0x11ddd14]  VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x0
V  [libjvm.dylib+0x58f220]  print_error_for_unit_test(char const*, char const*, char*)+0x0
V  [libjvm.dylib+0xe3ddb4]  NativeCallTrampolineStub::destination(nmethod*) const+0x0
V  [libjvm.dylib+0x12ac4]  SCCache::write_relocations(CodeBuffer*, unsigned int&)+0x34c

Jul 02 '24 20:07 vnkozlov

To fix Leyden premain, I'd suggest to change the "nmethod expected" assert in NativeCall::destination() into conditional code that returns the "raw" destination if it is not an nmethod, and optionally restore the following performance optimization (with a comment as suggested by Vladimir):

// Performance optimization: no need to call find_blob() if it is a self-call
if (destination == addr) {
  return destination;
}

But I don't have a strong opinion on whether it should be fixed here or only in Leyden.

Jul 03 '24 03:07 dean-long

To fix Leyden premain, I'd suggest to change the "nmethod expected" assert in NativeCall::destination() into conditional code that returns the "raw" destination if it is not an nmethod . . . But I don't have a strong opinion on whether it should be fixed here or only in Leyden.

Yes, I agree that will work as a solution.

I would recommend making this change in main. I think it is reasonable to expect NativeCall::destination() to be able to access the target for any instruction that can be viewed as a NativeCall, irrespective of whether it is embedded in an nmethod or some other blob. Clearly, the assert confirms that the current mainline code does not use it for anything other than an nmethod but there is nothing to say that it has to remain that way. Leyden is just one potential case where we might want to use it for some other blob.

Jul 03 '24 11:07 adinn

@adinn is right. I thought that it mostly affect code during codeBlob expansion but it is not.

@vnkozlov I was more right than I even realized! I was only concerned about generated stubs but, as you point out, we will also call NativeCall::destination() when processing a native call from JITted method code while it is still residing in a CodeBuffer.

Jul 03 '24 11:07 adinn

Restoring performance check in NativeCall::destination() was enough for leyden to work (PetClininc with one- and five-steps workflows):

  address NativeCall::destination() const {
    address addr = instruction_address();
    address destination = addr + displacement();

+   // Performance optimization: no need to call find_blob() if it is a self-call
+   if (destination == addr) {
+     return destination;
+   }

@dean-long , I think it should be added in your changes.

@adinn, I suggest you to test these changes with leyden changes for stubs.

Jul 03 '24 18:07 vnkozlov

  address NativeCall::destination() const {
    address addr = instruction_address();
    address destination = addr + displacement();

+   // Performance optimization: no need to call find_blob() if it is a self-call
+   if (destination == addr) {
+     return destination;
+   }

. . .

@adinn, I suggest you to test these changes with leyden changes for stubs.

@vnkozlov I applied @dean-long's patch to my Leyden premain repo that saves and restores generated stubs. Without the above extra patch it crashes. With it everything works fine.

So, @dean-long assuming the above tweak is applied I believe it is good to go.

Jul 08 '24 15:07 adinn

Unfortunately, adding the shortcut for self-calls is not enough for Leyden. Trampoline calls to always-reachable targets are bound early to their destination, so there can be NativeCalls that are not self-calls. To see this in a debug build, this line needs to be adjusted:

static const uint64_t branch_range = NOT_DEBUG(128 * M) DEBUG_ONLY(2 * M);

Jul 08 '24 20:07 dean-long

Do we generate trampolines for "always-reachable targets " ? Can you clarify how "branch_range" should be adjusted o trigger the issue for Leyden?

Jul 08 '24 22:07 vnkozlov

Do we generate trampolines for "always-reachable targets " ?

No, there's no trampoline stub. But we still call destination().

Can you clarify how "branch_range" should be adjusted o trigger the issue for Leyden?

  static const uint64_t branch_range = 128 * M;

Jul 08 '24 23:07 dean-long

static const uint64_t branch_range = 128 * M;

Okay, I will try.

Jul 09 '24 00:07 vnkozlov

static const uint64_t branch_range = 128 * M;

Okay, I will try.

With this I hit next assert during normal C2 compilation (during output phase) which does not seems Leyden specific:

#  Internal Error (/Users/vkozlov/work/leyden/open/src/hotspot/cpu/aarch64/aarch64.ad:2244), pid=27141, tid=28675
#  assert(masm-> offset() - offset <= (int) size_exception_handler()) failed: overflow
#

Jul 09 '24 15:07 vnkozlov

It was missing Leyden is 'ON' check:

   static bool codestub_branch_needs_far_jump() {
+    if (SCCache::is_on_for_write()) {
+      return true;
+    }

After fixing it I hit the assert in nativeInst_aarch64.cpp as @dean-long predicted:

#  Internal Error (/Users/vkozlov/work/leyden/open/src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp:63), pid=75825, tid=27139
#  assert(cb != nullptr && cb->is_nmethod()) failed: nmethod expected

Jul 09 '24 16:07 vnkozlov

AArch64 binds some trampoline call-sites early, thanks to its is_always_within_branch_range() check. This allows a false positive match with a trampoline stub during code buffer expansion in rare situations.

I am looking at the fix. I am not sure it is a proper fix. is_always_within_branch_range() returns a value based on a static configuration of CodeCache. The returned value stays correct independent from CodeBuffer expansion. I think the problem might be that CodeBuffer::relocate_code_to assumes the destination gets the final layout. So using it for CodeBuffer which is not finalized might be a problem. I have had issues with the function when I tried to implement relocation of nmethods to different places of CodeCache.

Jul 09 '24 17:07 eastig

I will let Dean answer Evgeny.

Here is result of my investigation. I think we can replace nmethod::stubs_comtains() with CodeBlob::code_contains() to check for trampoline code:

address NativeCall::destination() const {
  address addr = instruction_address();
  address destination = addr + displacement();

  // Performance optimization: no need to call find_blob() if it is a self-call
  if (destination == addr) {
    return destination;
  }
  // Do we use a trampoline stub for this call?
  CodeBlob* cb = CodeCache::find_blob(addr);
  assert(cb != nullptr, "CodeBlob expected");
  if (cb->code_contains(destination) && is_NativeCallTrampolineStub_at(destination)) {
    // Yes we do, so get the destination from the trampoline stub.
    const address trampoline_stub_addr = destination;
    destination = nativeCallTrampolineStub_at(trampoline_stub_addr)->destination();
  }

  return destination;
}

This pass Leyden testing.

My concern with simply returning raw address when it is not nmethod could be incorrect for Leyden case when we process CodeBuffer which contains code layout similar to nmethod and we can have trampoline call.

Jul 09 '24 18:07 vnkozlov

To answer @eastig, it is true that the branch target is correct in the source buffer during expansion. However, because the target is PC-relative, it can be temporarily incorrect when copied to the destination buffer. That is the problem with trusting the target from the destination buffer during expansion/copying.

Jul 09 '24 20:07 dean-long

My concern with simply returning raw address when it is not nmethod could be incorrect for Leyden case when we process CodeBuffer which contains code layout similar to nmethod and we can have trampoline call.

Yes, I had the same concern initially. But there is no way to connect the stub owner call site to the trampoline stub without calling trampoline_stub_Relocation::get_trampoline_for(), which requires an nmethod. Even if we tried to support it for CodeBuffers, it would be tricky because the target is PC-relative and the distance between the code section and stub section can change.

Jul 09 '24 20:07 dean-long

Okay. So the change I proposed will restore the issue because cb->code_contains(destination) && is_NativeCallTrampolineStub_at(destination) may point to incorrect trampoline code after expansion. :(

Looks like for Leyden (in Leyden branch) we need to avoid binding calls even if destination is reachable. So that we only have destination == addr case for trampoline calls when we process CodeBuffer. Looks like we missing an other case for SCCache::is_on_for_write() check.

Jul 09 '24 22:07 vnkozlov

Looks like for Leyden (in Leyden branch) we need to avoid binding calls even if destination is reachable. So that we only have destination == addr case for trampoline calls when we process CodeBuffer

Any destination == addr call needs a trampoline stub to store the final destination. The benefit of early binding for always reachable calls is we can avoid creating a trampoline stub. An alternative would be to always store the destination in the CallRelocation.

Jul 09 '24 23:07 dean-long

We should be pessimistic in Leyden. When we load AOT code there is no guarantee that destination is reachable. x86 uses flag ForceUnreachable which we set to true in Leyden. Aarch64 does not use this flag so we have to find all places where there are assumption about reachability.

Jul 10 '24 00:07 vnkozlov

So for Leyden it sounds like you need to change is_always_within_branch_range().

Jul 10 '24 01:07 dean-long

So for Leyden it sounds like you need to change is_always_within_branch_range().

Or perhaps just adapt MacroAssembler::far_branches(). It returns false if the code cache max range exceeds branch_range. In Leyden we can make it return false when we are generating AOT code.

Jul 10 '24 09:07 adinn

Oops, sorry, I got that the wrong way round. We need to change is_always_within_branch_range() as @dean-long suggested.

Jul 10 '24 09:07 adinn

jdk jdk copied to clipboard

8321509: False positive in get_trampoline fast path causes crash

Progress

Issue

Reviewers

Reviewing

Webrev

Webrevs

jdk
jdk copied to clipboard