jdk
jdk copied to clipboard
8321509: False positive in get_trampoline fast path causes crash
Progress
- [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8321509: False positive in get_trampoline fast path causes crash (Bug - P3)
Reviewers
- Vladimir Kozlov (@vnkozlov - Reviewer)
- Tobias Hartmann (@TobiHartmann - Reviewer)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19796/head:pull/19796
$ git checkout pull/19796
Update a local copy of the PR:
$ git checkout pull/19796
$ git pull https://git.openjdk.org/jdk.git pull/19796/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19796
View PR using the GUI difftool:
$ git pr show -t 19796
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19796.diff
Webrev
:wave: Welcome back dlong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@dean-long This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8321509: False positive in get_trampoline fast path causes crash
Reviewed-by: kvn, adinn, thartmann
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 195 new commits pushed to the master branch:
- b32e4a68bca588d908bd81a398eb3171a6876dc5: 8335356: Shenandoah: Improve concurrent cleanup locking
- 62cbf70346e78ca94ce6ea4ba5a308ea0a2bbfa8: 8336085: Fix simple -Wzero-as-null-pointer-constant warnings in CDS code
- 2928753bd95356467e4fe42ee391e45d1cb6e89c: 8324966: Allow selecting jtreg test case by ID from make
- 1772a929af0c31bf22153cc19c5d11b00273453b: 8334457: Test javax/swing/JTabbedPane/bug4666224.java fail on macOS with because pressing the ‘C’ key does not switch the layout to WRAP_TAB_LAYOUT
- b7d0eff5ad77e338b237773d2fc047eea3d2ac12: 8207908: JMXStatusTest.java fails assertion intermittently
- cf940e139a76e5aabd52379b8a87065d82b2284c: 8335553: [Graal] Compiler thread calls into jdk.internal.vm.VMSupport.decodeAndThrowThrowable and crashes in OOM situation
- b363de8c9fbf7d9e4aade41a2e883cc83ced320b: 8335946: DTrace code snippets should be generated when DTrace flags are enabled
- d6c6847e32673d36a1958cefd1851ec9f3b1e2ad: 8335743: jhsdb jstack cannot print some information on the waiting thread
- cad68e06ecad1e19091d1af9c0f9b8145d6842fb: 8335935: Chained builders not sending transformed models to next transforms
- 242f1133f8e1b373de3714cefc7f6701c39707fe: 8334481: [JVMCI] add LINK_TO_NATIVE to MethodHandleAccessProvider.IntrinsicMethod
- ... and 185 more: https://git.openjdk.org/jdk/compare/974dca80df71c5cbe492d1e8ca5cee76bcc79358...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.
@dean-long The following label will be automatically applied to this pull request:
hotspot-compiler
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Webrevs
- 03: Full - Incremental (5845db59)
- 02: Full - Incremental (09ed1da4)
- 01: Full - Incremental (e699b531)
- 00: Full (46e77d32)
AArch64 binds some trampoline call-sites early, thanks to its is_always_within_branch_range() check. This allows a false positive match with a trampoline stub during code buffer expansion in rare situations. To fix this, this PR makes the following changes:
- Do not call get_trampoline() in Relocation::pd_call_destination or pd_set_call_destination, as they use the destination cannot be trusted during fixup.
- Restrict NativeCall::get_trampoline() to only operate on nmethods, not CodeBuffers (or BufferBlob)
- Fixup trampoline stub "owners" (call sites) as late as possible, in new trampoline_stub_Relocation::pd_fix_owner_after_move(), and only if destination is an nmethod.
- Avoid calling NativeCall::set_destination_mt_safe() during CodeBuffer fixup, which allows assert_lock to also go away
- Detect self-calls in NativeCall::destination() to avoid unnecessary call to find_blob()
- Add NativeCall fast paths for pd_call_destination/pd_set_call_destination
Thanks Vladimir.
I am hoping an AArch64 expert can take a look at this. @theRealAph maybe?
This solution looks ok to me as far as jdk mainline is concerned. However, I think there is a problem as far Leyden is concerned.
The code changes Relocation::pd_call_destination to always expect its associated call to be embedded within an nmethod when orig_addr is null (i.e. when it is called with no args as reloc.pd_call_destination()). This is where the problem arises.
Currently, Leyden calls Relocation::pd_call_destination() from CallRelocation::destination() (and also from trampoline_stub_Relocation::destination()) when storing an nmethod to the CDS code cache. It needs to do this in order to be able to track relocs of type virtual_call_type, opt_virtual_call_type, static_call_type and runtime_call_type (also trampoline_stub_type). That is because all these relocs need their call destination to be adjusted when the nmethod is restored from the CDS code cache.
However, we already have prototype code in Leyden to store generated blobs to the CDS code cache. These blobs may legitimately include runtime_call_type relocs which also need tracking and adjusting at restore. For example, shared runtime or compiler stubs may call out to the JVM. Likewise, stubs in a stub generator blob may need to call out to the JVM or to a stub in some earlier generated blob. So, Leyden will need to call CallRelocation::destination() in cases where the associated call is embedded in a non-nmethod. Note that these calls will never employ trampolines.
The obvious fix is to modify Relocation::pd_call_destination so that it drops through to call MacroAssembler::pd_call_destination if the incoming blob is not an nmethod.
@adinn is right. I thought that it mostly affect code during codeBlob expansion but it is not. I applied patch to Leyden/premain repo and hit assert when we generate AOT code because we still processing CodeBuffer before nmethod is created:
# Internal Error (/work/leyden/open/src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp:59), pid=15382, tid=28419
# assert(cb != nullptr && cb->is_nmethod()) failed: nmethod expected
#
# JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-06-26-1746082.vkozlov...)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-06-26-1746082.vkozlov..., mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
...
Current CompileTask:
C1: F9322 C0 Q0 S6270 2840 b 2 java.util.zip.InflaterInputStream::close (34 bytes)
Stack: [0x000000016cdec000,0x000000016cfef000], sp=0x000000016cfed1d0, free space=2052k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.dylib+0x11dd564] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x544 (nativeInst_aarch64.cpp:59)
V [libjvm.dylib+0x11ddd14] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x0
V [libjvm.dylib+0x58f220] print_error_for_unit_test(char const*, char const*, char*)+0x0
V [libjvm.dylib+0xe3ddb4] NativeCallTrampolineStub::destination(nmethod*) const+0x0
V [libjvm.dylib+0x12ac4] SCCache::write_relocations(CodeBuffer*, unsigned int&)+0x34c
To fix Leyden premain, I'd suggest to change the "nmethod expected" assert in NativeCall::destination() into conditional code that returns the "raw" destination if it is not an nmethod, and optionally restore the following performance optimization (with a comment as suggested by Vladimir):
// Performance optimization: no need to call find_blob() if it is a self-call
if (destination == addr) {
return destination;
}
But I don't have a strong opinion on whether it should be fixed here or only in Leyden.
To fix Leyden premain, I'd suggest to change the "nmethod expected" assert in NativeCall::destination() into conditional code that returns the "raw" destination if it is not an nmethod . . . But I don't have a strong opinion on whether it should be fixed here or only in Leyden.
Yes, I agree that will work as a solution.
I would recommend making this change in main. I think it is reasonable to expect NativeCall::destination() to be able to access the target for any instruction that can be viewed as a NativeCall, irrespective of whether it is embedded in an nmethod or some other blob. Clearly, the assert confirms that the current mainline code does not use it for anything other than an nmethod but there is nothing to say that it has to remain that way. Leyden is just one potential case where we might want to use it for some other blob.
@adinn is right. I thought that it mostly affect code during codeBlob expansion but it is not.
@vnkozlov I was more right than I even realized! I was only concerned about generated stubs but, as you point out, we will also call NativeCall::destination() when processing a native call from JITted method code while it is still residing in a CodeBuffer.
Restoring performance check in NativeCall::destination() was enough for leyden to work (PetClininc with one- and five-steps workflows):
address NativeCall::destination() const {
address addr = instruction_address();
address destination = addr + displacement();
+ // Performance optimization: no need to call find_blob() if it is a self-call
+ if (destination == addr) {
+ return destination;
+ }
@dean-long , I think it should be added in your changes.
@adinn, I suggest you to test these changes with leyden changes for stubs.
address NativeCall::destination() const { address addr = instruction_address(); address destination = addr + displacement(); + // Performance optimization: no need to call find_blob() if it is a self-call + if (destination == addr) { + return destination; + }. . .
@adinn, I suggest you to test these changes with leyden changes for stubs.
@vnkozlov I applied @dean-long's patch to my Leyden premain repo that saves and restores generated stubs. Without the above extra patch it crashes. With it everything works fine.
So, @dean-long assuming the above tweak is applied I believe it is good to go.
Unfortunately, adding the shortcut for self-calls is not enough for Leyden. Trampoline calls to always-reachable targets are bound early to their destination, so there can be NativeCalls that are not self-calls. To see this in a debug build, this line needs to be adjusted:
static const uint64_t branch_range = NOT_DEBUG(128 * M) DEBUG_ONLY(2 * M);
Do we generate trampolines for "always-reachable targets " ? Can you clarify how "branch_range" should be adjusted o trigger the issue for Leyden?
Do we generate trampolines for "always-reachable targets " ?
No, there's no trampoline stub. But we still call destination().
Can you clarify how "branch_range" should be adjusted o trigger the issue for Leyden?
static const uint64_t branch_range = 128 * M;
static const uint64_t branch_range = 128 * M;
Okay, I will try.
static const uint64_t branch_range = 128 * M;
Okay, I will try.
With this I hit next assert during normal C2 compilation (during output phase) which does not seems Leyden specific:
# Internal Error (/Users/vkozlov/work/leyden/open/src/hotspot/cpu/aarch64/aarch64.ad:2244), pid=27141, tid=28675
# assert(masm-> offset() - offset <= (int) size_exception_handler()) failed: overflow
#
It was missing Leyden is 'ON' check:
static bool codestub_branch_needs_far_jump() {
+ if (SCCache::is_on_for_write()) {
+ return true;
+ }
After fixing it I hit the assert in nativeInst_aarch64.cpp as @dean-long predicted:
# Internal Error (/Users/vkozlov/work/leyden/open/src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp:63), pid=75825, tid=27139
# assert(cb != nullptr && cb->is_nmethod()) failed: nmethod expected
AArch64 binds some trampoline call-sites early, thanks to its is_always_within_branch_range() check. This allows a false positive match with a trampoline stub during code buffer expansion in rare situations.
I am looking at the fix. I am not sure it is a proper fix.
is_always_within_branch_range() returns a value based on a static configuration of CodeCache. The returned value stays correct independent from CodeBuffer expansion.
I think the problem might be that CodeBuffer::relocate_code_to assumes the destination gets the final layout. So using it for CodeBuffer which is not finalized might be a problem. I have had issues with the function when I tried to implement relocation of nmethods to different places of CodeCache.
I will let Dean answer Evgeny.
Here is result of my investigation. I think we can replace nmethod::stubs_comtains() with CodeBlob::code_contains() to check for trampoline code:
address NativeCall::destination() const {
address addr = instruction_address();
address destination = addr + displacement();
// Performance optimization: no need to call find_blob() if it is a self-call
if (destination == addr) {
return destination;
}
// Do we use a trampoline stub for this call?
CodeBlob* cb = CodeCache::find_blob(addr);
assert(cb != nullptr, "CodeBlob expected");
if (cb->code_contains(destination) && is_NativeCallTrampolineStub_at(destination)) {
// Yes we do, so get the destination from the trampoline stub.
const address trampoline_stub_addr = destination;
destination = nativeCallTrampolineStub_at(trampoline_stub_addr)->destination();
}
return destination;
}
This pass Leyden testing.
My concern with simply returning raw address when it is not nmethod could be incorrect for Leyden case when we process CodeBuffer which contains code layout similar to nmethod and we can have trampoline call.
To answer @eastig, it is true that the branch target is correct in the source buffer during expansion. However, because the target is PC-relative, it can be temporarily incorrect when copied to the destination buffer. That is the problem with trusting the target from the destination buffer during expansion/copying.
My concern with simply returning raw address when it is not nmethod could be incorrect for Leyden case when we process CodeBuffer which contains code layout similar to nmethod and we can have trampoline call.
Yes, I had the same concern initially. But there is no way to connect the stub owner call site to the trampoline stub without calling trampoline_stub_Relocation::get_trampoline_for(), which requires an nmethod. Even if we tried to support it for CodeBuffers, it would be tricky because the target is PC-relative and the distance between the code section and stub section can change.
Okay. So the change I proposed will restore the issue because cb->code_contains(destination) && is_NativeCallTrampolineStub_at(destination) may point to incorrect trampoline code after expansion. :(
Looks like for Leyden (in Leyden branch) we need to avoid binding calls even if destination is reachable. So that we only have destination == addr case for trampoline calls when we process CodeBuffer. Looks like we missing an other case for SCCache::is_on_for_write() check.
Looks like for Leyden (in Leyden branch) we need to avoid binding calls even if destination is reachable. So that we only have destination == addr case for trampoline calls when we process CodeBuffer
Any destination == addr call needs a trampoline stub to store the final destination. The benefit of early binding for always reachable calls is we can avoid creating a trampoline stub. An alternative would be to always store the destination in the CallRelocation.
We should be pessimistic in Leyden. When we load AOT code there is no guarantee that destination is reachable.
x86 uses flag ForceUnreachable which we set to true in Leyden. Aarch64 does not use this flag so we have to find all places where there are assumption about reachability.
So for Leyden it sounds like you need to change is_always_within_branch_range().
So for Leyden it sounds like you need to change is_always_within_branch_range().
Or perhaps just adapt MacroAssembler::far_branches(). It returns false if the code cache max range exceeds branch_range. In Leyden we can make it return false when we are generating AOT code.
Oops, sorry, I got that the wrong way round. We need to change is_always_within_branch_range() as @dean-long suggested.