jdk
jdk copied to clipboard
8331911: Reconsider locking for recently disarmed nmethods
Notes We are spending significant time on acquiring the per-nmethod as all the threads are in same nmethod. Adding double-check lock by calling is_armed before lock acquisition.
Verification
Shenendoah
% /home/neethp/Development/opensource/jdk/build/linux-x86_64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc ManyThreadsStacks.java 2>&1 | grep "marking roots" [0.706s][info][gc] GC(0) Concurrent marking roots 11.519ms [0.752s][info][gc] GC(1) Concurrent marking roots 9.833ms [0.814s][info][gc] GC(2) Concurrent marking roots 10.000ms [0.855s][info][gc] GC(3) Concurrent marking roots 9.314ms [0.895s][info][gc] GC(4) Concurrent marking roots 8.937ms [1.213s][info][gc] GC(5) Concurrent marking roots 12.582ms [1.340s][info][gc] GC(6) Concurrent marking roots 9.574ms [1.465s][info][gc] GC(7) Concurrent marking roots 12.791ms
ZGC
% /home/neethp/Development/opensource/jdk/build/linux-x86_64-server-release/images/jdk/bin/java -Xmx1g -Xms1g -XX:+UseShenandoahGC -Xlog:gc ManyThreadsStacks.java 2>&1 | grep "marking roots" [0.732s][info][gc] GC(0) Concurrent marking roots 10.694ms [0.782s][info][gc] GC(1) Concurrent marking roots 14.614ms [0.825s][info][gc] GC(2) Concurrent marking roots 12.700ms [0.863s][info][gc] GC(3) Concurrent marking roots 9.622ms [0.904s][info][gc] GC(4) Concurrent marking roots 12.892ms [1.244s][info][gc] GC(5) Concurrent marking roots 12.422ms [1.375s][info][gc] GC(6) Concurrent marking roots 12.756ms [1.503s][info][gc] GC(7) Concurrent marking roots 12.265ms [1.628s][info][gc] GC(8) Concurrent marking roots 12.309ms [1.754s][info][gc] GC(9) Concurrent marking roots 12.996ms [1.879s][info][gc] GC(10) Concurrent marking roots 9.416ms
Issue https://bugs.openjdk.org/browse/JDK-8331911
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Error
⚠️ OCA signatory status must be verified
Issue
- JDK-8331911: Reconsider locking for recently disarmed nmethods (Enhancement - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19285/head:pull/19285
$ git checkout pull/19285
Update a local copy of the PR:
$ git checkout pull/19285
$ git pull https://git.openjdk.org/jdk.git pull/19285/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19285
View PR using the GUI difftool:
$ git pr show -t 19285
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19285.diff
Hi @neethu-prasad, welcome to this OpenJDK project and thanks for contributing!
We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.
If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user neethu-prasad" as summary for the issue.
If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.
@neethu-prasad This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8331911: Reconsider locking for recently disarmed nmethods
Reviewed-by: shade, eosterlund
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 82 new commits pushed to the master branch:
- 4b153e5e051c01ad8d0c3ff335352918c2970fe6: 8306580: Propagate CDS dumping errors instead of directly exiting the VM
- 71a692ab435fdeea4ce8f8db7a55dd735c7c5016: 8321033: Avoid casting Array to GrowableArray
- 55c796946158aab1d019a57b77a33441d7b13065: 8334765: JFR: Log chunk waste
- b2930c5aeedf911ec893734181c1af0573e222f4: 8334040: jdk/classfile/CorpusTest.java timed out
- e825ccfe6652577e4e828e8e4dfe19be0ea77813: 8332362: Implement os::committed_in_range for MacOS and AIX
- 5ac2149b7bde947886533bf5996d977bb8ec66f1: 8334299: Deprecate LockingMode option, along with LM_LEGACY and LM_MONITOR
- 2e64d15144be03388104c762816c1ba629da9639: 8334564: VM startup: fatal error: FLAG_SET_ERGO cannot be used to set an invalid value for NonNMethodCodeHeapSize
- 9d4a4bd2c2a4bd16bbc80b602b15b448c52220f6: 8324841: PKCS11 tests still skip execution
- ca5a438e5a4612c66f70c70a9d425eca0e49e84d: 8334571: Extract control dependency rewiring out of PhaseIdealLoop::dominated_by() into separate method
- 05ff3185edd25b381a97f6879f496e97b62dddc2: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572
- ... and 72 more: https://git.openjdk.org/jdk/compare/c94af6f943c179553d1827550847b93491d47506...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev, @fisk) but any other Committer may sponsor as well.
➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).
@neethu-prasad The following labels will be automatically applied to this pull request:
hotspot-gcshenandoah
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.
/covered
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!
Webrevs
- 04: Full (86be5b5f)
- 03: Full - Incremental (d474c1b7)
- 02: Full - Incremental (c59034da)
- 01: Full - Incremental (8c5bad2e)
- 00: Full (5a0d5562)
:warning: @neethu-prasad the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:
$ git checkout JDK-8331911
$ git commit --author='Preferred Full Name <[email protected]>' --allow-empty -m 'Update full name'
$ git push
@fisk @stefank -- are you good with this for ZGC?
It seems fine to me that the GC backends are responsible for checking if the nmethod is disarmed outside the lock. However, we have some callers that now check it redundantly. I think those callers should stop doing that now. Otherwise, this looks good to me.
Thanks for the feedback! Looking around the code, I think there are a few places where we can do more changes.
First, remove check here: https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/code/nmethod.cpp#L852-L855
This would force us to add the check in super-class implementation here: https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/gc/shared/barrierSetNMethod.cpp#L83
Second, we can remove the check here: https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/gc/shared/barrierSetNMethod.cpp#L175-L181
But it does not seem straightforward, because we currently skip cross-modification fence based on is_armed(...) check. Unfortunately, we cannot easily know if nmethod_entry_barrier acted or not, we only know if method is safe or not. Can we / should we do these refactoring separately?
It seems fine to me that the GC backends are responsible for checking if the nmethod is disarmed outside the lock. However, we have some callers that now check it redundantly. I think those callers should stop doing that now. Otherwise, this looks good to me.
Thanks for the feedback! Looking around the code, I think there are a few places where we can do more changes.
First, remove check here:
https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/code/nmethod.cpp#L852-L855
This would force us to add the check in super-class implementation here:
https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/gc/shared/barrierSetNMethod.cpp#L83
Second, we can remove the check here:
https://github.com/openjdk/jdk/blob/bc7d9e3d0bc663bbbeb068889082da4a9f0fa8de/src/hotspot/share/gc/shared/barrierSetNMethod.cpp#L175-L181
But it does not seem straightforward, because we currently skip cross-modification fence based on is_armed(...) check. Unfortunately, we cannot easily know if nmethod_entry_barrier acted or not, we only know if method is safe or not. Can we / should we do these refactoring separately?
I see your point. However, this PR is refactoring the code to iron out who is responsible for checking is_armed, so I would prefer if we got that right in this PR. We say it should be the backend code doing that, so the callers shouldn't. I agree with all the changes you just listed and if you make them I would be happy.
Regarding the cross modifying fence, I strongly prefer to not try and be clever. Just run the cross modifying fence unconditionally after calling the backend code. We get there because the barrier was armed anyway.
@fisk
I've addressed the feedback. Can you take a look?
I did not remove the check here. Removing this check resulted in time out when -XX:+DeoptimizeNMethodBarriersALot flag set as it executes deoptimization code path
@fisk
I've addressed the feedback. Can you take a look?
I did not remove the check here. Removing this check resulted in time out when
-XX:+DeoptimizeNMethodBarriersALotflag set as it executes deoptimization code path
I am quite nervous about having that silly optimization there. So I'm going to have to insist on removing it. Perhaps though, it should be fixed as a separate issue. Allow me to explain myself why it makes me nervous. The nmethod entry barriers guard modifications to instructions and data through a mixed bag of synchronous and asynchronous cross modifying code. Synchronous cross modifying code is the sane thing to do; we modify instructions, guarded from concurrent execution by a data flag. After modifying the instructions, the data flag is flipped, and observers are allowed to execute the instructions after executing an instruction cross modification fence. On for example AArch64 we only perform synchronous cross modifying code, and the stub has the fencing machinery, so it should be okay. However, on x86_64, we perform a mix of asynchronous and synchronous cross modifying code. The guard word is the immediate part of a compare instruction. If the new disarmed immediate is observed by concurrent execution, instruction cache coherency guarantees that we will correctly observe the cross modified instructions when they are subsequently executed. However, when we go into the stub slow path, and check is_armed etc, these are data reads. That makes the dance entirely different as it suddenly performs synchronous cross modifying code. If a data read observes that the instructions have been modified, we don't have the same level of guarantees any longer, unless we perform an instruction cross modification fence. So my concern here, is that the silly optimization to fix some verification code timeout or whatever, is in fact causing a real correctness problem for real release builds, on x86_64. By skipping the cross modification fence we perform an incomplete synchronous instruction cross modification dance that isn't sound. Having said that, perhaps we should file a separate issue to remove that check, since it seems to fix an actual bug, while I guess this was meant more as an optimization. What do you think?
Having said that, perhaps we should file a separate issue to remove that check, since it seems to fix an actual bug, while I guess this was meant more as an optimization. What do you think?
FTR, I don't mind executing cross-modify-fence unconditionally. I do mind going into deopts too often. I do also think that we want to stay on performance-positive side for at least an easy variant of fix, and do potentially regressing things separately. The initial motivation for this work was to resolve an issue in a service workload that runs many threads with similar stacks, and get something that we are sure about for a prompt backport.
To that end, we can continue working out the final shape of the patch here, while we mitigate our current service problems with picking up a limited version of this patch with JDK-8333716 -- it resolves only Shenandoah parts of it, though. Or, we can integrate this patch in its current form, resolving the issue on both Shenandoah and ZGC paths, and work out the check removal as the follow up of JDK-8310239.
I think the latter alternative is more pragmatic.
Having said that, perhaps we should file a separate issue to remove that check, since it seems to fix an actual bug, while I guess this was meant more as an optimization. What do you think?
FTR, I don't mind executing cross-modify-fence unconditionally. I do mind going into deopts too often. I do also think that we want to stay on performance-positive side for at least an easy variant of fix, and do potentially regressing things separately. The initial motivation for this work was to resolve an issue in a service workload that runs many threads with similar stacks, and get something that we are sure about for a prompt backport.
Fair enough. For what it's worth, aside for the deopt stressing option with arbitrary frequency we can update, we will not deopt more. We just perform an extra cross modifying fence when racingly entering an nmethod concurrently being disarmed. Not performing it might be slightly faster, but is a bug. But I see your point.
To that end, we can continue working out the final shape of the patch here, while we mitigate our current service problems with picking up a limited version of this patch with JDK-8333716 -- it resolves only Shenandoah parts of it, though. Or, we can integrate this patch in its current form, resolving the issue on both Shenandoah and ZGC paths, and work out the check removal as the follow up of JDK-8310239.
I think the latter alternative is more pragmatic.
I'm okay with approving this patch, and we fix the actual bug separately. Sounds good? Then this is a refactoring and optimization, without the bug fix.
@fisk I just merged the latest changes. Do I need approval on the merge commit or can I integrate?
Thanks for the review & approval. I've created follow up bug - https://bugs.openjdk.org/browse/JDK-8334890
/integrate
@neethu-prasad Your change (at version 86be5b5f03c6596f0b1d889b7c8013beb6d832dd) is now ready to be sponsored by a Committer.
/sponsor
Going to push as commit c30e040342c69a213bdff321fdcb0d27ff740489.
Since your change was applied there have been 85 commits pushed to the master branch:
- 974dca80df71c5cbe492d1e8ca5cee76bcc79358: 8334223: Make Arena MEMFLAGs immutable
- e527e1c32fcc7b2560cec540bcde930075ac284a: 8334580: Deprecate no-arg constructor BasicSliderUI() for removal
- 3a26bbcebc2f7d11b172f2b16192a3adefeb8111: 8185429: [macos] After a modal dialog is closed, no window becomes active
- 4b153e5e051c01ad8d0c3ff335352918c2970fe6: 8306580: Propagate CDS dumping errors instead of directly exiting the VM
- 71a692ab435fdeea4ce8f8db7a55dd735c7c5016: 8321033: Avoid casting Array to GrowableArray
- 55c796946158aab1d019a57b77a33441d7b13065: 8334765: JFR: Log chunk waste
- b2930c5aeedf911ec893734181c1af0573e222f4: 8334040: jdk/classfile/CorpusTest.java timed out
- e825ccfe6652577e4e828e8e4dfe19be0ea77813: 8332362: Implement os::committed_in_range for MacOS and AIX
- 5ac2149b7bde947886533bf5996d977bb8ec66f1: 8334299: Deprecate LockingMode option, along with LM_LEGACY and LM_MONITOR
- 2e64d15144be03388104c762816c1ba629da9639: 8334564: VM startup: fatal error: FLAG_SET_ERGO cannot be used to set an invalid value for NonNMethodCodeHeapSize
- ... and 75 more: https://git.openjdk.org/jdk/compare/c94af6f943c179553d1827550847b93491d47506...master
Your commit was automatically rebased without conflicts.
@shipilev @neethu-prasad Pushed as commit c30e040342c69a213bdff321fdcb0d27ff740489.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.