jdk
jdk copied to clipboard
8330171: Lazy W^X switch implementation
An alternative for preemptively switching the W^X thread mode on macOS with an AArch64 CPU. This implementation triggers the switch in response to the SIGBUS signal if the si_addr belongs to the CodeCache area. With this approach, it is now feasible to eliminate all WX guards and avoid potentially costly operations. However, no significant improvement or degradation in performance has been observed. Additionally, considering the issue with AsyncGetCallTrace, the patched JVM has been successfully operated with asgct_bottom and async-profiler.
Additional testing:
- [x] MacOS AArch64 server fastdebug gtets
- [ ] MacOS AArch64 server fastdebug jtreg:hotspot:tier4
- [ ] Benchmarking
@apangin and @parttimenerd could you please check the patch on your scenarios??
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8330171: Lazy W^X switch implementation (Enhancement - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18762/head:pull/18762
$ git checkout pull/18762
Update a local copy of the PR:
$ git checkout pull/18762
$ git pull https://git.openjdk.org/jdk.git pull/18762/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18762
View PR using the GUI difftool:
$ git pr show -t 18762
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18762.diff
Webrev
:wave: Welcome back snazarki! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
❗ This change is not yet ready to be integrated. See the Progress checklist in the description for automated requirements.
@snazarkin The following labels will be automatically applied to this pull request:
graalhotspotserviceabilityshenandoah
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.
Hello Sergey. W^X mode was initially forced by Apple to prevent writing to executable memory, as a security feature. This change just eliminates this security feature at all, doesn't it ? Basically: "want to write to Executable memory ? ok, here you go"
Hello Sergey. W^X mode was initially forced by Apple to prevent writing to executable memory, as a security feature. This change just eliminates this security feature at all, doesn't it ? Basically: "want to write to Executable memory ? ok, here you go"
Yes @VladimirKempik, you are right. No, we should not do this.
Instead, when we enter the VM we could track the current state of W^X and whenever we enter a block that needs to write into code space we would set W if needed. When we leave the VM or when we call back into Java we would set X, if needed. The cost of doing this would be small, but we'd have to find all the blocks that need to write into code space. This might be more effort than we want to make, though.
So where would be need to make the transitions to W? At a guess, whenever we start assembling something, and in all of the methods in nativeInst_aarch64.?pp, and in class Patcher. And to X, in the call stub and a few other places.
That would minimize the transitions exactly to the set of places we actually need.
This ( changes of this PR) can be left as an addition to existing mechanism. Disabled by default and can be enabled with a special (DEVELOP) option. So this can't be enabled on production builds, but can be useful to debug w^x issues on debug builds ( with additional logging)
Thanks @theRealAph, @VladimirKempik
Instead, when we enter the VM we could track the current state of W^X and whenever we enter a block that needs to write into code space we would set W if needed. When we leave the VM or when we call back into Java we would set X, if needed. The cost of doing this would be small, but we'd have to find all the blocks that need to write into code space. This might be more effort than we want to make, though.
It is the way in which it is implemented in the current code. Unfortunately, it is hardly maintainable solution that suffers from issues like [1-5].
As I understand it, your concern is that the implementation doesn't prevent rogue from writing to the code cache with some unsafe api?
- https://bugs.openjdk.org/browse/JDK-8302736
- https://bugs.openjdk.org/browse/JDK-8327990
- https://bugs.openjdk.org/browse/JDK-8327036
- https://bugs.openjdk.org/browse/JDK-8304725
- https://bugs.openjdk.org/browse/JDK-8307549
Mailing list message from Andrew Haley on hotspot-dev:
On 4/12/24 18:18, Sergey Nazarkin wrote:
?It is the way in which it is implemented in the current code.
No, it's not. That's not what we do at all.
We don't set W^X when we need it: instead, we set it at certain times in the hope that it'll be needed. I'm suggesting we should set W^X *exactly* where we need it, such as at patching methods. Not at VM entry.
Get rid of all the assert_wx_state. If deoptimization needs WXWrite, then it should set it, not hope for someone else to have done it.
I have one question, and I'm sorry if it has been answered before. How expensive is changing the mode? Is it just a status variable in user-space pthread lib? Or does it need a system call?
In other words, how fine granular can we get without incurring too high a cost?
I have one question, and I'm sorry if it has been answered before. How expensive is changing the mode? Is it just a status variable in user-space pthread lib? Or does it need a system call?
In other words, how fine granular can we get without incurring too high a cost?
It's expensive. We've seen it cause significant slowdowns in Java->VM transitions.
Mailing list message from Kim Barrett on hotspot-dev:
On Apr 13, 2024, at 8:35 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:
On 4/12/24 18:18, Sergey Nazarkin wrote:
?It is the way in which it is implemented in the current code.
No, it's not. That's not what we do at all.
We don't set W^X when we need it: instead, we set it at certain times in the hope that it'll be needed. I'm suggesting we should set W^X *exactly* where we need it, such as at patching methods. Not at VM entry.
Get rid of all the assert_wx_state. If deoptimization needs WXWrite, then it should set it, not hope for someone else to have done it.
There was a recent internal-to-Oracle discussion about W^X, which led to something that I think is along the lines of what @aph is suggesting, including a prototype. I will poke some of the folks who were more deeply involved in that discussion (I was just casually following along, and am not knowledgeable in this area), but it's a weekend now, so it might take them some time to respond.
-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: <https://mail.openjdk.org/pipermail/hotspot-dev/attachments/20240413/343e5f80/signature.asc>
I agree that this PR effectively removes the whole idea behind JIT_MAP and thus is a bad idea security wise. Still it has some value.
@snazarkin do you have numbers on how many transitions are done with your PR vs. the current state when running the same program? That would give us a lower bound on the amount of transitions needed and define a goal for future improvements in that area.
do you have numbers on how many transitions are done with your PR vs. the current state when running the same program?
With just simple java -version it is ~180 vs ~9500 (new vs old), for java -help ~1120 vs ~86300. For the applications the ration is about the same.
What about granting WXWrite only if the current thread is in _thread_in_vm?
That would be more restrictive and roughly equivalent how it currently works. Likely there are some places then that should be granted WXWrite eagerly because they need WXWrite without _thread_in_vm. E.g. the JIT compiler threads should have WXWrite and never WXExec (I assume) which should be checked in the signal handler.
The patch doesn't protect against native agents, as this is obviously impossible. The current code doesn't do that either. For the bytecode, it doesn't prevent the attacker from abusing unsafe api to modify code cache. However unsafe functions are already considered "safe" and we proactively enable WXWrite as well as move thread to _thread_in_vm state (@reinrich). JITed code can't write to the cache either with or without the patch.
I totally get the sense of loss of security. But is this really the case?
The patch doesn't protect against native agents, as this is obviously impossible. The current code doesn't do that either. For the bytecode, it doesn't prevent the attacker from abusing unsafe api to modify code cache. However unsafe functions are already considered "safe" and we proactively enable WXWrite as well as move thread to
_thread_in_vmstate (@reinrich). JITed code can't write to the cache either with or without the patch.I totally get the sense of loss of security. But is this really the case?
I think it is. W^X is intended (amongst other things) to protect against the use of gadgets, from buffer overflow exploits in non-java code to ROP programming. At present, in order to generate code and execute it, you first have to be able to make the JIT code writable, then write the code, then make it executable. then jump to the code. And the exploit writer might have to do some or all of this by finding gadgets. If we were to merge this patch then all the attacker would have to do is write code to memory and find a way to jump to it, and the automatic switch-on-segfault in this patch would do the all the work the attacker needs.
It makes far more sense to tag those places that actually need to change W^X access, and only switch there.
You could argue that any switching of W^X on a write to code space, then switching it back on jumping (or returning) to Java code, even what we already do, is effectively the same thing. Kinda, but it's not on just any attempt to write to code space or any attempt to jump into code, it's at the places we choose, and we can be careful to limit those places.
But surely the JDK is not the most vulnerable part of the stack anyway? I'd agree with that, of course, but I don't think that's sufficient reason to decide to bypass an OS security mechanism.
We are trying to reduce the size of the attack surface.
To add a little to @theRealAph 's point, we should avoid painting ourselves into a corner. I don't know how the platform is going to evolve, but I'd be nervous about fighting against the intentions of the protections.
@snazarkin this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:
git checkout w_x_z
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push
@snazarkin This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@snazarkin This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.