valhalla
valhalla copied to clipboard
8336845: [lworld] Virtual threads don't support the value class calling convention
Background
The scalarized calling convention will pass value objects by-value by passing their field values instead of a reference. Now this is only done by C2 while the interpreter and C1 still use references to heap "buffers". When calling from the interpreter or C1 to C2 compiled code, we nee to "unpack" from the heap buffer. The difficulty is that unpacking of value type arguments in the nmethod entry point might require additional stack space. For example, let's assume we have a method that takes two value type arguments v1 and v2 of a type that has four int fields f1 - f4.
The register/stack layout after method entry for an un-scalarized call would look like this:
(1)
rsi: v1
rdx: v2
rcx:
r8:
r9:
rdi:
0x18: ... <- sp of caller
0x10: return_address
0x08: rbp
0x00: locals
And for the scalarized call it would look like this:
(2)
rsi: v1.f1
rdx: v1.f2
rcx: v1.f3
r8: v1.f4
r9: v2.f1
rdi: v2.f2
0x28: ... <- sp of caller
0x20: v2.f4
0x18: v2.f3
0x10: return_address
0x08: rbp
0x00: locals
In this case, the scalarized calling convention requires two additional stack slots to pass v2.f3 and v2.f4 because we don't have enough registers. Since we cannot overwrite stack slots in the caller frame, we need to extend the callee frame before unpacking.
To convert from (1) to (2), we have a [Verified Value Entry Point] that will extend the stack:
[Verified Value Entry Point]
pop %r13 ; save return address
sub $0x20,%rsp ; extend stack (make sure stack remains aligned)
push %r13 ; copy return address to top of stack
... ; unpack value type arguments
push %rbp ; save rbp
sub $0x10,%rsp ; create frame (constant size)
movq $0x38,0x8(%rsp) ; save stack increment 0x10 + size(rbp) + 0x20
jmpq [Entry]
This is implemented in MacroAssembler::extend_stack_for_inline_args:
https://github.com/openjdk/valhalla/blob/12e5ddae614f88ad9262cd73f06003b569495cc4/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L7028-L7039
In addition, we have a [Verified Entry Point] that will be used for scalarized to scalarized calls and will not extend the stack. It still needs to set the stack increment:
[Verified Entry Point]
push %rbp ; save rbp
sub $0x10,%rsp ; create frame (constant size)
movq $0x18,0x8(%rsp) ; save stack increment 0x10 + size(rbp)
[Entry]
... ; method body
mov 0x10(%rsp),%rbp ; restore rbp
add 0x8(%rsp),%rsp ; repair stack
retq
This is implemented in C2_MacroAssembler::verified_entry:
https://github.com/openjdk/valhalla/blob/12e5ddae614f88ad9262cd73f06003b569495cc4/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L124-L128
This protocol allows for an efficient conversion between the calling conventions with minimal impact to scalarized code. The only difference is that the epilog that used to have a constant stack increment now has to repair the stack:
[...]
add $0x10,%rsp ; repair stack
pop %rbp ; restore rbp
retq
This is implemented in MacroAssembler::remove_frame:
https://github.com/openjdk/valhalla/blob/12e5ddae614f88ad9262cd73f06003b569495cc4/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L7271-L7274
Also, stack walking in frame::safe_for_sender and frame::sender_for_compiled_frame now needs to repair the stack to get a correct sender_sp if the current frame extended the stack. This is done in frame::repair_sender_sp() by adding the sp_inc if necessary:
https://github.com/openjdk/valhalla/blob/12e5ddae614f88ad9262cd73f06003b569495cc4/src/hotspot/cpu/x86/frame_x86.cpp#L702-L714
Deoptimization uses the same stack walking code and is therefore not affected.
This PR
Code for freezing and thawing virtual threads walks stack frames and therefore needs to be aware of C2 compiled frames that have a dynamic size and might need a stack repair.
Thanks, Tobias
Progress
- [x] Change must not contain extraneous whitespace
Issue
- JDK-8336845: [lworld] Virtual threads don't support the value class calling convention (Bug - P2)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1399/head:pull/1399
$ git checkout pull/1399
Update a local copy of the PR:
$ git checkout pull/1399
$ git pull https://git.openjdk.org/valhalla.git pull/1399/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1399
View PR using the GUI difftool:
$ git pr show -t 1399
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1399.diff
:wave: Welcome back thartmann! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
❗ This change is not yet ready to be integrated. See the Progress checklist in the description for automated requirements.
@TobiHartmann This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@TobiHartmann This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.