jdk 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See JEP 491 for further details.

In order to make the code review easier the changes have been split into the following initial 4 commits:

Changes to allow unmounting a virtual thread that is currently holding monitors.
Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor.
Changes to allow unmounting a virtual thread blocked in Object.wait() and its timed-wait variants.
Changes to tests, JFR pinned event, and other changes in the JDK libraries.

The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones.

The changes fix pinning issues when using LM_LIGHTWEIGHT, i.e. the default locking mode, (and LM_MONITOR which comes for free), but not when using LM_LEGACY mode. Note that the LockingMode flag has already been deprecated (JDK-8334299), with the intention to remove LM_LEGACY code in future releases.

Summary of changes

Unmount virtual thread while holding monitors

As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things:

We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads.
For inflated monitors we now record the java.lang.Thread.tid of the owner in the ObjectMonitor's _owner field instead of a JavaThread*. This allows us to tie the owner of the monitor to a java.lang.Thread instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around.

General notes about this part:

Since virtual threads don't need to worry about holding monitors anymore, we don't need to count them, except for LM_LEGACY. So the majority of the platform dependent changes in this commit have to do with correcting this.
Zero and x86 (32 bits) where counting monitors even though they don't implement continuations, so I fixed that to stop counting. The idea is to remove all the counting code once we remove LM_LEGACY.
Macro LOOM_MONITOR_SUPPORT was added at the time to exclude ports that implement continuations but don't yet implement monitor support. It is removed later with the ppc commit changes.
Since now a virtual thread can be unmounted while holding monitors, JVMTI methods GetOwnedMonitorInfo and GetOwnedMonitorStackDepthInfo had to be adapted.

Notes specific to the tid changes:

The tid is cached in the JavaThread object under _lock_id. It is set on JavaThread creation and changed on mount/unmount.
Changes in the ObjectMonitor class in this commit are pretty much exclusively related to changing _owner and _succ from void* and JavaThread* respectively to int64_t.
Although we are not trying to fix LM_LEGACY the tid changes apply to it as well since the inflated path is shared. Thus, in case of inflation by a contending thread, the BasicLock* cannot be stored in the _owner field as before. The _owner is instead set to anonymous as we do in LM_LIGHTWEIGHT, and the BasicLock* is stored in the new field _stack_locker.
We already assume 32 bit platforms can handle 64 bit atomics, including cmpxchg (JDK-8318776) so the shared code can stay the same. The assembly code for the c2 fast paths has to be adapted though. On arm (32bits) we already jump directly to the slow path on inflated monitor case so there is nothing to do. For x86 (32bits), since the port is moving towards deprecation (JDK-8338285) there is no point in trying to optimize, so the code was changed to do the same thing we do for arm (32bits).

Unmounting a virtual thread blocked on synchronized

Currently virtual thread unmounting is always started from Java, either because of a voluntarily call to Thread.yield() or because of performing some blocking operation such as I/O. Now we allow to unmount from inside the VM too, specifically when facing contention trying to acquire a Java monitor.

On failure to acquire a monitor inside ObjectMonitor::enter a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to Continuation.run() to proceed with the unmount logic. Once the owner releases the monitor and selects it as the next successor the virtual thread will be added again to the scheduler queue to run again. The virtual thread will run and attempt to acquire the monitor again. If it succeeds then it will thaw frames as usual to continue execution back were it left off. If it fails it will unmount and wait again to be unblocked.

General notes about this part:

The easiest way to review these changes is to start from the monitorenter call in the interpreter and follow all the flow of the virtual thread, from unmounting to running again.
Currently we use a dedicated unblocker thread to submit the virtual threads back to the scheduler queue. This avoids calls to Java from monitorexit. We are experimenting on removing this limitation, but that will be left as an enhancement for a future change.
We cannot unmount the virtual thread when the monitor enter call is coming from jni_enter() or ObjectLocker since we would need to freeze native frames.
If freezing fails, which almost always will be due to having native frames on the stack, the virtual thread will follow the normal platform thread logic but will do a timed-park instead. This is to alleviate some deadlocks cases where the successor picked is an unmounted virtual thread that cannot run, which can happen during class loading or class initiatialization.
After freezing all frames, and while adding itself to the _cxq the virtual thread could have successfully acquired the monitor. In that case we mark the preemption as cancelled. The virtual thread will still need to go back to the preempt stub to cleanup the physical stack but instead of unmounting it will call thaw to continue execution.
The way we jump to the preempt stub is slightly different in the compiler and interpreter. For the compiled case we just patch a return address, so no new code is added. For the interpreter we cannot do this on all platforms so we just check a flag back in the interpreter. For the latter we also need to manually restore some state after we finally acquire the monitor and resume execution. All that logic is contained in new assembler method call_VM_preemptable().

Notes specific to JVMTI changes:

Since we are not unmounting from Java, there is no call to VirtualThread.yieldContinuation(). This means that we have to execute the equivalent of notifyJvmtiUnmount(/*hide*/true) for unmount, and of notifyJvmtiMount(/*hide*/false) for mount in the VM. The former is implemented with JvmtiUnmountBeginMark in Continuation::try_preempt(). The latter is implemented in method jvmti_mount_end() in ContinuationFreezeThaw at the end of thaw.
When unmounting from Java the vthread unmount event is posted before we try to freeze the continuation. If that fails then we post the mount event. This all happens in VirtualThread.yieldContinuation(). When unmounting from the VM we only post the event once we know the freeze succeeded. Since at that point we are in the middle of the VTMS transition, posting the event is done in JvmtiVTMSTransitionDisabler::VTMS_unmount_end() after the transition finishes. Maybe the same thing should be done when unmounting from Java.

Unmounting a virtual thread blocked on `Object.wait()`

This commit just extends the previous mechanism to be able to unmount inside the VM on ObjectMonitor::wait.

General notes about this part:

The mechanism works as before with the difference that now the call will come from the native wrapper. This requires to add support to the continuation code to handle native wrapper frames, which is a main part of the changes in this commit.
Both the compiled and interpreted native wrapper code will check for preemption on return from the wait call, after we have transitioned back to _thread_in_Java.

Note specific to JVMTI changes:

If the monitor waited event is enabled we need to post it after the wait is done but before re-acquiring the monitor. Since the virtual thread is inside the VTMS transition at that point, we cannot do that directly. Currently in the code we end the transition, post the event and start the transition again. This is not ideal, and maybe we should unmount, post the event and then run again to try reacquire the monitor.

Test changes + JFR Updates + Library code changes

Tests

The tests in java/lang/Thread/virtual are updated to add more tests for monitor enter/exit and Object.wait/notify. New tests are added for JFR events, synchronized native methods, and stress testing for several scenarios.
test/hotspot/gtest/nmt/test_vmatree.cpp is changed due to an alias that conflicts.
A small number of tests, e.g. test/hotspot/jtreg/serviceability/sa/ClhsdbInspect.java and test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002, are updated so they are in sync with the JDK code.
A number of JVMTI tests are updated to fix various issues, e.g. some tests saved a JNIEnv in a static.

Diagnosing remaining pinning issues

The diagnostic option jdk.tracePinnedThreads is removed.
The JFR jdk.VirtualThreadPinned event is changed so that it's now recorded in the VM, and for the following cases: parking when pinned, blocking in monitor enter when pinned, Object.wait when pinned, and waiting for a class to be initialized by another thread. The changes to object monitors should mean that only a few events are recorded. Future work may change this to a sampling approach.

Other changes to VirtualThread class

The VirtualThread implementation includes a few robustness changes. The park/parkNanos methods now park on the carrier if the freeze throws OOME. Moreover, the use of transitions is reduced so that the call out to the scheduler no longer requires a temporary transition.

Other changes to libraries:

ReferenceQueue is reverted to use synchronized, the subclass based on ReentrantLock is removed. This change is done now because the changes for object monitors impact this area when there is preemption polling a reference queue.
java.io is reverted to use synchronized. This change has been important for testing virtual threads. There will be follow-up cleanup in main-line after the JEP is integrated to remove InternalLock and its uses in java.io.
The epoll and kqueue based Selectors are changed to preempt when doing blocking selects. This has been useful for testing virtual threads with some libraries, e.g. JDBC drivers. We could potentially separate this update if needed but it has been included in all testing and EA builds.
sun.security.ssl.X509TrustManagerImpl is changed to eagerly initialize AnchorCertificates, a forced change due to deadlocks in this code when testing.

Testing

The changes have been running in the Loom pipeline for several months now. They have also been included in EA builds throughout the year at different stages (EA builds from earlier this year did not had Object.wait() support yet but more recent ones did) so there has been some external exposure too.

The current patch has been run through mach5 tiers 1-8. I'll keep running tests periodically until integration time.

Progress

[ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue
[ ] Change requires CSR request JDK-8338813 to be approved

Issues

JDK-8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning (Enhancement - P2)
JDK-8338813: Implement JEP 491: Synchronize Virtual Threads without Pinning (CSR)

Contributors

Patricio Chilano Mateo <[email protected]>
Alan Bateman <[email protected]>
Andrew Haley <[email protected]>
Fei Yang <[email protected]>
Coleen Phillimore <[email protected]>
Richard Reingruber <[email protected]>
Martin Doerr <[email protected]>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565
$ git checkout pull/21565

Update a local copy of the PR:
$ git checkout pull/21565
$ git pull https://git.openjdk.org/jdk.git pull/21565/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21565

View PR using the GUI difftool:
$ git pr show -t 21565

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21565.diff

Webrev

Link to Webrev Comment

Oct 17 '24 14:10 pchilano

:wave: Welcome back pchilanomate! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Oct 17 '24 14:10 bridgekeeper[bot]

@pchilano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

Co-authored-by: Patricio Chilano Mateo <[email protected]>
Co-authored-by: Alan Bateman <[email protected]>
Co-authored-by: Andrew Haley <[email protected]>
Co-authored-by: Fei Yang <[email protected]>
Co-authored-by: Coleen Phillimore <[email protected]>
Co-authored-by: Richard Reingruber <[email protected]>
Co-authored-by: Martin Doerr <[email protected]>
Reviewed-by: aboldtch, dholmes, coleenp, fbredberg, dlong, sspitsyn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 12 new commits pushed to the master branch:

3727f4046188bb623f9efec6fa149f767a9ffa30: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling
b53ee053f7f7ffcf02ff47e1895ce7be4bc32486: 8202617: javadoc generates broken links to undocumented (e.g. private) members
cfe719fbded84dfbc8b25ee2d809ac90f86deb70: 8340565: Create separate index page for terms defined by the index tag
baabfbba3e7b5d9c860de38f1f9ed9cd36848f29: 8341904: Search tag in inherited doc comment creates additional index item
4fa760a1ed24ad2e6fba6dca51c5cf7dc7436719: 8343936: Adjust timeout in test javax/management/monitor/DerivedGaugeMonitorTest.java
cbf4dd588bf371e13e81204b1585d34bfadddb42: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic
ef0dc2518e7636cc8a9ca580613ff5edeb4c19fd: 8342707: Prepare Gatherers for graduation from Preview
889f906235e99b7207f2e30e1f6f5771188f5a56: 8343774: Positive list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java
6088d620b44b83fac41ba403a059208414b32a89: 8343755: Unproblemlist java/lang/Thread/jni/AttachCurrentThread/AttachTest.java
80f4c0c38a57960a1c96de72af6fc69ef10337ce: 8343442: Add since checker tests to the networking area modules
... and 2 more: https://git.openjdk.org/jdk/compare/babb52a08361b00eb4bc6e2e109b1fdc198dbd59...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Oct 17 '24 14:10 openjdk[bot]

@pchilano The following labels will be automatically applied to this pull request:

core-libs
graal
hotspot
nio
security
serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

Oct 17 '24 14:10 openjdk[bot]

/contributor add @pchilano /contributor add @AlanBateman /contributor add @theRealAph /contributor add @RealFYang /contributor add @coleenp

Oct 18 '24 04:10 pchilano

@pchilano Contributor Patricio Chilano Mateo <[email protected]> successfully added.

Oct 18 '24 04:10 openjdk[bot]

@pchilano Contributor Alan Bateman <[email protected]> successfully added.

Oct 18 '24 04:10 openjdk[bot]

@pchilano Contributor Andrew Haley <[email protected]> successfully added.

Oct 18 '24 04:10 openjdk[bot]

@pchilano Contributor Fei Yang <[email protected]> successfully added.

Oct 18 '24 04:10 openjdk[bot]

@pchilano Contributor Coleen Phillimore <[email protected]> successfully added.

Oct 18 '24 04:10 openjdk[bot]

/label remove security

Oct 18 '24 13:10 pchilano

@pchilano The security label was successfully removed.

Oct 18 '24 13:10 openjdk[bot]

/contributor add @reinrich /contributor add @TheRealMDoerr

Oct 18 '24 17:10 pchilano

@pchilano Contributor Richard Reingruber <[email protected]> successfully added.

Oct 18 '24 17:10 openjdk[bot]

@pchilano Contributor Martin Doerr <[email protected]> successfully added.

Oct 18 '24 17:10 openjdk[bot]

Webrevs

Oct 18 '24 19:10 mlbridge[bot]

* We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads.

This last sentence has interesting consequences for user-defined schedulers. Would it make sense to throw an exception if a carrier thread is holding a monitor while mounting a virtual thread? Doing that would also have the advantage of making some kinds of deadlock impossible.

Oct 22 '24 15:10 theRealAph

Then I looked at typing up the thread / lock ids as an enum class https://github.com/openjdk/jdk/commit/34221f4a50a492cad4785cfcbb4bef8fa51d6f23

Both of these suggested changes should be discussed as different RFEs. I don't really like this ThreadID change because it seems to introduce casting everywhere.

Oct 22 '24 23:10 coleenp

The tid is cached in the JavaThread object under _lock_id. It is set on JavaThread creation and changed on mount/unmount.

Why do we need to cache it? Is it the implicit barriers related to accessing the threadObj oop each time?

Keeping this value up-to-date is a part I find quite confusing.

Oct 23 '24 06:10 dholmes-ora

This last sentence has interesting consequences for user-defined schedulers. Would it make sense to throw an exception if a carrier thread is holding a monitor while mounting a virtual thread? Doing that would also have the advantage of making some kinds of deadlock impossible.

There's nothing exposed today to allow custom schedulers. The experiments/explorations going on right now have to be careful to not hold any locks. Throwing if holding a monitor is an option but only it would need to be backed by spec and would also shine light on the issue of j.u.concurrent locks as a carrier might independently hold a lock there too.

Oct 23 '24 10:10 AlanBateman

Why do we need to cache it? Is it the implicit barriers related to accessing the threadObj oop each time?

We cache threadObj.thread_id in JavaThread::_lock_id so that the fast path c2_MacroAssembler code has one less load and code to find the offset of java.lang.Thread.threadId in the code. Also, yes, we were worried about performance of the barrier in this path.

Oct 23 '24 19:10 coleenp

Mailing list message from Olexandr Rotan on core-libs-dev:

Hi. Just wanted to express my gratitude to everyone who has been working on this and virtual threads as a whole. I am a big fan of this technology and seeing largest issue go away makes me incredibly happy. Thanks for loom team and everyone else who took part in this great innovation (instead of turning each codebase into async/await painting competition :) )

On Thu, Oct 24, 2024, 10:06 Alan Bateman <alanb at openjdk.org> wrote:

-------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20241024/c3953c3d/attachment.htm>

Oct 24 '24 07:10 mlbridge[bot]

I looked at java.lang.ref and java.lang.invoke changes. ReferenceQueue was reverted back to use synchronized and also adding the code disable/enable preemption looks right.

Oct 25 '24 17:10 mlchung

The InternalLock and ByteArrayOutputStream changes look all right. I'll follow up with JDK-8343039 once this PR for JEP 491 is integrated.

Oct 25 '24 18:10 bplb

On failure to acquire a monitor inside ObjectMonitor::enter a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to Continuation.run() to proceed with the unmount logic.

During this time, the Java frames are not changing, so it seems like it doesn't matter if the freeze/copy happens immediately or after we unwind the native frames and enter the preempt stub. In fact, it seems like it could be more efficient to delay the freeze/copy, given the fact that the preemption can be canceled.

Oct 26 '24 02:10 dean-long

Noticed while downloading this that some copyrights need updating.

Oct 28 '24 16:10 coleenp

On failure to acquire a monitor inside ObjectMonitor::enter a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to Continuation.run() to proceed with the unmount logic.

During this time, the Java frames are not changing, so it seems like it doesn't matter if the freeze/copy happens immediately or after we unwind the native frames and enter the preempt stub. In fact, it seems like it could be more efficient to delay the freeze/copy, given the fact that the preemption can be canceled.

The problem is that freezing the frames can fail. By then we would have already added the ObjectWaiter as representing a virtual thread. Regarding efficiency (and ignoring the previous issue) both approaches would be equal anyways, since regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption.

Oct 28 '24 18:10 pchilano

Looking at this reminds me of a paper I read a long time ago, "Using continuations to implement thread management and communication in operating systems" (https://dl.acm.org/doi/10.1145/121133.121155).

Oct 28 '24 22:10 dean-long

regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption.

Is this purely a performance optimization, or is there a correctness issue if we don't notice the monitor was released and cancel the preemption? It seems like the monitor can be released at any time, so what makes freeze special that we need to check afterwards? We aren't doing the monitor check atomically, so the monitor could get released right after we check it. So I'm guessing we choose to check after freeze because freeze has non-trivial overhead.

Oct 28 '24 23:10 dean-long

I have reviewed the changes to the NIO selector/poller implementations and they look fine.

Oct 29 '24 13:10 Michael-Mc-Mahon

regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption.

Is this purely a performance optimization, or is there a correctness issue if we don't notice the monitor was released and cancel the preemption? It seems like the monitor can be released at any time, so what makes freeze special that we need to check afterwards? We aren't doing the monitor check atomically, so the monitor could get released right after we check it. So I'm guessing we choose to check after freeze because freeze has non-trivial overhead.

After adding the ObjectWaiter to the _cxq we always have to retry acquiring the monitor; this is the same for platform threads. So freezing before that, implies we have to retry. As for whether we need to cancel the preemption if we acquire the monitor, not necessarily. We could still unmount with a state of YIELDING, so the virtual thread will be scheduled to run again. So that part is an optimization to avoid the unmount.

Oct 29 '24 19:10 pchilano

jdk jdk copied to clipboard

8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

Summary of changes

Unmount virtual thread while holding monitors

General notes about this part:

Notes specific to the tid changes:

Unmounting a virtual thread blocked on synchronized

General notes about this part:

Notes specific to JVMTI changes:

Unmounting a virtual thread blocked on Object.wait()

General notes about this part:

Note specific to JVMTI changes:

Test changes + JFR Updates + Library code changes

Tests

Diagnosing remaining pinning issues

Other changes to VirtualThread class

Other changes to libraries:

Testing

Progress

Issues

Contributors

Reviewing

Webrev

Webrevs

jdk
jdk copied to clipboard

Unmounting a virtual thread blocked on `Object.wait()`