jdk
jdk copied to clipboard
8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning
This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See JEP 491 for further details.
In order to make the code review easier the changes have been split into the following initial 4 commits:
- Changes to allow unmounting a virtual thread that is currently holding monitors.
- Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor.
- Changes to allow unmounting a virtual thread blocked in
Object.wait()and its timed-wait variants. - Changes to tests, JFR pinned event, and other changes in the JDK libraries.
The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones.
The changes fix pinning issues when using LM_LIGHTWEIGHT, i.e. the default locking mode, (and LM_MONITOR which comes for free), but not when using LM_LEGACY mode. Note that the LockingMode flag has already been deprecated (JDK-8334299), with the intention to remove LM_LEGACY code in future releases.
Summary of changes
Unmount virtual thread while holding monitors
As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things:
-
We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads.
-
For inflated monitors we now record the
java.lang.Thread.tidof the owner in the ObjectMonitor's_ownerfield instead of a JavaThread*. This allows us to tie the owner of the monitor to ajava.lang.Threadinstance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around.
General notes about this part:
- Since virtual threads don't need to worry about holding monitors anymore, we don't need to count them, except for
LM_LEGACY. So the majority of the platform dependent changes in this commit have to do with correcting this. - Zero and x86 (32 bits) where counting monitors even though they don't implement continuations, so I fixed that to stop counting. The idea is to remove all the counting code once we remove
LM_LEGACY. - Macro
LOOM_MONITOR_SUPPORTwas added at the time to exclude ports that implement continuations but don't yet implement monitor support. It is removed later with the ppc commit changes. - Since now a virtual thread can be unmounted while holding monitors, JVMTI methods
GetOwnedMonitorInfoandGetOwnedMonitorStackDepthInfohad to be adapted.
Notes specific to the tid changes:
- The tid is cached in the JavaThread object under
_lock_id. It is set on JavaThread creation and changed on mount/unmount. - Changes in the ObjectMonitor class in this commit are pretty much exclusively related to changing
_ownerand_succfromvoid*andJavaThread*respectively toint64_t. - Although we are not trying to fix
LM_LEGACYthe tid changes apply to it as well since the inflated path is shared. Thus, in case of inflation by a contending thread, theBasicLock*cannot be stored in the_ownerfield as before. The_owneris instead set to anonymous as we do inLM_LIGHTWEIGHT, and theBasicLock*is stored in the new field_stack_locker. - We already assume 32 bit platforms can handle 64 bit atomics, including
cmpxchg(JDK-8318776) so the shared code can stay the same. The assembly code for the c2 fast paths has to be adapted though. On arm (32bits) we already jump directly to the slow path on inflated monitor case so there is nothing to do. For x86 (32bits), since the port is moving towards deprecation (JDK-8338285) there is no point in trying to optimize, so the code was changed to do the same thing we do for arm (32bits).
Unmounting a virtual thread blocked on synchronized
Currently virtual thread unmounting is always started from Java, either because of a voluntarily call to Thread.yield() or because of performing some blocking operation such as I/O. Now we allow to unmount from inside the VM too, specifically when facing contention trying to acquire a Java monitor.
On failure to acquire a monitor inside ObjectMonitor::enter a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to Continuation.run() to proceed with the unmount logic. Once the owner releases the monitor and selects it as the next successor the virtual thread will be added again to the scheduler queue to run again. The virtual thread will run and attempt to acquire the monitor again. If it succeeds then it will thaw frames as usual to continue execution back were it left off. If it fails it will unmount and wait again to be unblocked.
General notes about this part:
- The easiest way to review these changes is to start from the monitorenter call in the interpreter and follow all the flow of the virtual thread, from unmounting to running again.
- Currently we use a dedicated unblocker thread to submit the virtual threads back to the scheduler queue. This avoids calls to Java from monitorexit. We are experimenting on removing this limitation, but that will be left as an enhancement for a future change.
- We cannot unmount the virtual thread when the monitor enter call is coming from
jni_enter()orObjectLockersince we would need to freeze native frames. - If freezing fails, which almost always will be due to having native frames on the stack, the virtual thread will follow the normal platform thread logic but will do a timed-park instead. This is to alleviate some deadlocks cases where the successor picked is an unmounted virtual thread that cannot run, which can happen during class loading or class initiatialization.
- After freezing all frames, and while adding itself to the
_cxqthe virtual thread could have successfully acquired the monitor. In that case we mark the preemption as cancelled. The virtual thread will still need to go back to the preempt stub to cleanup the physical stack but instead of unmounting it will call thaw to continue execution. - The way we jump to the preempt stub is slightly different in the compiler and interpreter. For the compiled case we just patch a return address, so no new code is added. For the interpreter we cannot do this on all platforms so we just check a flag back in the interpreter. For the latter we also need to manually restore some state after we finally acquire the monitor and resume execution. All that logic is contained in new assembler method
call_VM_preemptable().
Notes specific to JVMTI changes:
- Since we are not unmounting from Java, there is no call to
VirtualThread.yieldContinuation(). This means that we have to execute the equivalent ofnotifyJvmtiUnmount(/*hide*/true)for unmount, and ofnotifyJvmtiMount(/*hide*/false)for mount in the VM. The former is implemented withJvmtiUnmountBeginMarkinContinuation::try_preempt(). The latter is implemented in methodjvmti_mount_end()inContinuationFreezeThawat the end of thaw. - When unmounting from Java the vthread unmount event is posted before we try to freeze the continuation. If that fails then we post the mount event. This all happens in
VirtualThread.yieldContinuation(). When unmounting from the VM we only post the event once we know the freeze succeeded. Since at that point we are in the middle of the VTMS transition, posting the event is done inJvmtiVTMSTransitionDisabler::VTMS_unmount_end()after the transition finishes. Maybe the same thing should be done when unmounting from Java.
Unmounting a virtual thread blocked on Object.wait()
This commit just extends the previous mechanism to be able to unmount inside the VM on ObjectMonitor::wait.
General notes about this part:
- The mechanism works as before with the difference that now the call will come from the native wrapper. This requires to add support to the continuation code to handle native wrapper frames, which is a main part of the changes in this commit.
- Both the compiled and interpreted native wrapper code will check for preemption on return from the wait call, after we have transitioned back to
_thread_in_Java.
Note specific to JVMTI changes:
- If the monitor waited event is enabled we need to post it after the wait is done but before re-acquiring the monitor. Since the virtual thread is inside the VTMS transition at that point, we cannot do that directly. Currently in the code we end the transition, post the event and start the transition again. This is not ideal, and maybe we should unmount, post the event and then run again to try reacquire the monitor.
Test changes + JFR Updates + Library code changes
Tests
- The tests in
java/lang/Thread/virtualare updated to add more tests for monitor enter/exit and Object.wait/notify. New tests are added for JFR events, synchronized native methods, and stress testing for several scenarios. test/hotspot/gtest/nmt/test_vmatree.cppis changed due to an alias that conflicts.- A small number of tests, e.g.
test/hotspot/jtreg/serviceability/sa/ClhsdbInspect.javaandtest/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002, are updated so they are in sync with the JDK code. - A number of JVMTI tests are updated to fix various issues, e.g. some tests saved a JNIEnv in a static.
Diagnosing remaining pinning issues
- The diagnostic option
jdk.tracePinnedThreadsis removed. - The JFR
jdk.VirtualThreadPinnedevent is changed so that it's now recorded in the VM, and for the following cases: parking when pinned, blocking in monitor enter when pinned, Object.wait when pinned, and waiting for a class to be initialized by another thread. The changes to object monitors should mean that only a few events are recorded. Future work may change this to a sampling approach.
Other changes to VirtualThread class
The VirtualThread implementation includes a few robustness changes. The park/parkNanos methods now park on the carrier if the freeze throws OOME. Moreover, the use of transitions is reduced so that the call out to the scheduler no longer requires a temporary transition.
Other changes to libraries:
ReferenceQueueis reverted to usesynchronized, the subclass based onReentrantLockis removed. This change is done now because the changes for object monitors impact this area when there is preemption polling a reference queue.java.iois reverted to usesynchronized. This change has been important for testing virtual threads. There will be follow-up cleanup in main-line after the JEP is integrated to removeInternalLockand its uses injava.io.- The epoll and kqueue based Selectors are changed to preempt when doing blocking selects. This has been useful for testing virtual threads with some libraries, e.g. JDBC drivers. We could potentially separate this update if needed but it has been included in all testing and EA builds.
sun.security.ssl.X509TrustManagerImplis changed to eagerly initialize AnchorCertificates, a forced change due to deadlocks in this code when testing.
Testing
The changes have been running in the Loom pipeline for several months now. They have also been included in EA builds throughout the year at different stages (EA builds from earlier this year did not had Object.wait() support yet but more recent ones did) so there has been some external exposure too.
The current patch has been run through mach5 tiers 1-8. I'll keep running tests periodically until integration time.
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [ ] Change requires CSR request JDK-8338813 to be approved
Issues
- JDK-8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning (Enhancement - P2)
- JDK-8338813: Implement JEP 491: Synchronize Virtual Threads without Pinning (CSR)
Contributors
- Patricio Chilano Mateo
<[email protected]> - Alan Bateman
<[email protected]> - Andrew Haley
<[email protected]> - Fei Yang
<[email protected]> - Coleen Phillimore
<[email protected]> - Richard Reingruber
<[email protected]> - Martin Doerr
<[email protected]>
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565
$ git checkout pull/21565
Update a local copy of the PR:
$ git checkout pull/21565
$ git pull https://git.openjdk.org/jdk.git pull/21565/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 21565
View PR using the GUI difftool:
$ git pr show -t 21565
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21565.diff
Webrev
:wave: Welcome back pchilanomate! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@pchilano This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning
Co-authored-by: Patricio Chilano Mateo <[email protected]>
Co-authored-by: Alan Bateman <[email protected]>
Co-authored-by: Andrew Haley <[email protected]>
Co-authored-by: Fei Yang <[email protected]>
Co-authored-by: Coleen Phillimore <[email protected]>
Co-authored-by: Richard Reingruber <[email protected]>
Co-authored-by: Martin Doerr <[email protected]>
Reviewed-by: aboldtch, dholmes, coleenp, fbredberg, dlong, sspitsyn
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 12 new commits pushed to the master branch:
- 3727f4046188bb623f9efec6fa149f767a9ffa30: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling
- b53ee053f7f7ffcf02ff47e1895ce7be4bc32486: 8202617: javadoc generates broken links to undocumented (e.g. private) members
- cfe719fbded84dfbc8b25ee2d809ac90f86deb70: 8340565: Create separate index page for terms defined by the index tag
- baabfbba3e7b5d9c860de38f1f9ed9cd36848f29: 8341904: Search tag in inherited doc comment creates additional index item
- 4fa760a1ed24ad2e6fba6dca51c5cf7dc7436719: 8343936: Adjust timeout in test javax/management/monitor/DerivedGaugeMonitorTest.java
- cbf4dd588bf371e13e81204b1585d34bfadddb42: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic
- ef0dc2518e7636cc8a9ca580613ff5edeb4c19fd: 8342707: Prepare Gatherers for graduation from Preview
- 889f906235e99b7207f2e30e1f6f5771188f5a56: 8343774: Positive list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java
- 6088d620b44b83fac41ba403a059208414b32a89: 8343755: Unproblemlist java/lang/Thread/jni/AttachCurrentThread/AttachTest.java
- 80f4c0c38a57960a1c96de72af6fc69ef10337ce: 8343442: Add since checker tests to the networking area modules
- ... and 2 more: https://git.openjdk.org/jdk/compare/babb52a08361b00eb4bc6e2e109b1fdc198dbd59...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.
@pchilano The following labels will be automatically applied to this pull request:
core-libsgraalhotspotniosecurityserviceability
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.
/contributor add @pchilano /contributor add @AlanBateman /contributor add @theRealAph /contributor add @RealFYang /contributor add @coleenp
@pchilano
Contributor Patricio Chilano Mateo <[email protected]> successfully added.
@pchilano
Contributor Alan Bateman <[email protected]> successfully added.
@pchilano
Contributor Andrew Haley <[email protected]> successfully added.
@pchilano
Contributor Fei Yang <[email protected]> successfully added.
@pchilano
Contributor Coleen Phillimore <[email protected]> successfully added.
/label remove security
@pchilano
The security label was successfully removed.
/contributor add @reinrich /contributor add @TheRealMDoerr
@pchilano
Contributor Richard Reingruber <[email protected]> successfully added.
@pchilano
Contributor Martin Doerr <[email protected]> successfully added.
Webrevs
- 31: Full - Incremental (c0c7e6cf)
- 30: Full - Incremental (124efa0a)
- 29: Full - Incremental (79189f9b)
- 28: Full - Incremental (11396312)
- 27: Full - Incremental (52c26642)
- 26: Full - Incremental (33eb6388)
- 25: Full (113fb3d3)
- 24: Full - Incremental (e5a9ce2a)
- 23: Full - Incremental (aa263f56)
- 22: Full - Incremental (9f086c52)
- 21: Full - Incremental (0951dfe0)
- 20: Full - Incremental (aa682de2)
- 19: Full - Incremental (63003d37)
- 18: Full - Incremental (9fd4c036)
- 17: Full - Incremental (0f3b9021)
- 16: Full - Incremental (3e8b4fe6)
- 15: Full - Incremental (056d21ec)
- 14: Full - Incremental (fc9aa074)
- 13: Full - Incremental (bd918fa7)
- 12: Full - Incremental (7cb4cffd)
- 11: Full - Incremental (66d5385f)
- 10: Full - Incremental (d6313cf7)
- 09: Full - Incremental (0308ee4c)
- 08: Full - Incremental (c7a82c45)
- 07: Full - Incremental (03ba6dfb)
- 06: Full - Incremental (baf7ffab)
- 05: Full - Incremental (e232b7f3)
- 04: Full - Incremental (b6bc98e2)
- 03: Full - Incremental (81e5c6d0)
- 02: Full - Incremental (23d1a2be)
- 01: Full - Incremental (8c196acd)
- 00: Full (6a81ccdc)
* We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads.
This last sentence has interesting consequences for user-defined schedulers. Would it make sense to throw an exception if a carrier thread is holding a monitor while mounting a virtual thread? Doing that would also have the advantage of making some kinds of deadlock impossible.
Then I looked at typing up the thread / lock ids as an enum class https://github.com/openjdk/jdk/commit/34221f4a50a492cad4785cfcbb4bef8fa51d6f23
Both of these suggested changes should be discussed as different RFEs. I don't really like this ThreadID change because it seems to introduce casting everywhere.
The tid is cached in the JavaThread object under _lock_id. It is set on JavaThread creation and changed on mount/unmount.
Why do we need to cache it? Is it the implicit barriers related to accessing the threadObj oop each time?
Keeping this value up-to-date is a part I find quite confusing.
This last sentence has interesting consequences for user-defined schedulers. Would it make sense to throw an exception if a carrier thread is holding a monitor while mounting a virtual thread? Doing that would also have the advantage of making some kinds of deadlock impossible.
There's nothing exposed today to allow custom schedulers. The experiments/explorations going on right now have to be careful to not hold any locks. Throwing if holding a monitor is an option but only it would need to be backed by spec and would also shine light on the issue of j.u.concurrent locks as a carrier might independently hold a lock there too.
Why do we need to cache it? Is it the implicit barriers related to accessing the threadObj oop each time?
We cache threadObj.thread_id in JavaThread::_lock_id so that the fast path c2_MacroAssembler code has one less load and code to find the offset of java.lang.Thread.threadId in the code. Also, yes, we were worried about performance of the barrier in this path.
Mailing list message from Olexandr Rotan on core-libs-dev:
Hi. Just wanted to express my gratitude to everyone who has been working on this and virtual threads as a whole. I am a big fan of this technology and seeing largest issue go away makes me incredibly happy. Thanks for loom team and everyone else who took part in this great innovation (instead of turning each codebase into async/await painting competition :) )
On Thu, Oct 24, 2024, 10:06 Alan Bateman <alanb at openjdk.org> wrote:
-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20241024/c3953c3d/attachment.htm>
I looked at java.lang.ref and java.lang.invoke changes. ReferenceQueue was reverted back to use synchronized and also adding the code disable/enable preemption looks right.
The InternalLock and ByteArrayOutputStream changes look all right. I'll follow up with JDK-8343039 once this PR for JEP 491 is integrated.
On failure to acquire a monitor inside
ObjectMonitor::entera virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return toContinuation.run()to proceed with the unmount logic.
During this time, the Java frames are not changing, so it seems like it doesn't matter if the freeze/copy happens immediately or after we unwind the native frames and enter the preempt stub. In fact, it seems like it could be more efficient to delay the freeze/copy, given the fact that the preemption can be canceled.
Noticed while downloading this that some copyrights need updating.
On failure to acquire a monitor inside
ObjectMonitor::entera virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return toContinuation.run()to proceed with the unmount logic.During this time, the Java frames are not changing, so it seems like it doesn't matter if the freeze/copy happens immediately or after we unwind the native frames and enter the preempt stub. In fact, it seems like it could be more efficient to delay the freeze/copy, given the fact that the preemption can be canceled.
The problem is that freezing the frames can fail. By then we would have already added the ObjectWaiter as representing a virtual thread. Regarding efficiency (and ignoring the previous issue) both approaches would be equal anyways, since regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption.
Looking at this reminds me of a paper I read a long time ago, "Using continuations to implement thread management and communication in operating systems" (https://dl.acm.org/doi/10.1145/121133.121155).
regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption.
Is this purely a performance optimization, or is there a correctness issue if we don't notice the monitor was released and cancel the preemption? It seems like the monitor can be released at any time, so what makes freeze special that we need to check afterwards? We aren't doing the monitor check atomically, so the monitor could get released right after we check it. So I'm guessing we choose to check after freeze because freeze has non-trivial overhead.
I have reviewed the changes to the NIO selector/poller implementations and they look fine.
regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption.
Is this purely a performance optimization, or is there a correctness issue if we don't notice the monitor was released and cancel the preemption? It seems like the monitor can be released at any time, so what makes freeze special that we need to check afterwards? We aren't doing the monitor check atomically, so the monitor could get released right after we check it. So I'm guessing we choose to check after freeze because freeze has non-trivial overhead.
After adding the ObjectWaiter to the _cxq we always have to retry acquiring the monitor; this is the same for platform threads. So freezing before that, implies we have to retry. As for whether we need to cancel the preemption if we acquire the monitor, not necessarily. We could still unmount with a state of YIELDING, so the virtual thread will be scheduled to run again. So that part is an optimization to avoid the unmount.