jdk
jdk copied to clipboard
8291714: Implement a Multi-Reader Single-Writer mutex for Hotspot
May I please have a review for this PR which implements a MRWMutex
class for Hotspot?
This PR does 3 things:
- Adds a port of ZGC's MRSW mutex (see [0]) to Hotspot
- Adds some new utilities for writing multi-threaded tests.
- Adds some tests for MRSW Mutex using these new utilities
The ticket has some comments which might be worth checking out: https://bugs.openjdk.org/browse/JDK-8291714
[0] Original source code here: https://github.com/openjdk/zgc/blob/zgc_generational/src/hotspot/share/gc/z/zJNICritical.cpp
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8291714: Implement a Multi-Reader Single-Writer mutex for Hotspot
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9838/head:pull/9838
$ git checkout pull/9838
Update a local copy of the PR:
$ git checkout pull/9838
$ git pull https://git.openjdk.org/jdk pull/9838/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9838
View PR using the GUI difftool:
$ git pr show -t 9838
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9838.diff
:wave: Welcome back jdksjolen! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
/label add hotspot
@jdksjolen The following label will be automatically applied to this pull request:
-
hotspot-runtime
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
@jdksjolen
The hotspot
label was successfully added.
Webrevs
- 04: Full - Incremental (5bb6a520)
- 03: Full (da53eaa2)
- 02: Full - Incremental (d659314d)
- 01: Full - Incremental (e842c508)
- 00: Full (30883e04)
Multi-Reader Single-Writer mutex
Sorry to bikeshed but such a thing is called a ReadWriteLock or a ReadersWriterLock or something similar - can we please use familiar nomenclature here (even rwlock as per POSIX). "mutex" is a contraction of mutual-exclusion and as readers are not mutually exclusive it is an oxymoron to talk about a "multi-reader mutex". Thanks.
64 bit cmpxchg isn't available for some 32 bit CPUs. Maybe it could be int32_t?
Right. AFAICS only ARMv6 would be affected by this - does anyone still build for that? (grep for SUPPORTS_NATIVE_CX8). But a 32-bit count should be more than adequate.
Forgot to mention the code in the header file seems rather involved for a .hpp file - should these be in a .inline.hpp file instead?
Also did you establish this RW-lock will in fact be usable in this form for UL?
Also did you establish this RW-lock will in fact be usable in this form for UL?
I have rewritten the relevant code (LogOutputList
class) using this RW-lock and it can run the VM in both async and synch mode. It passes all the tests for the class, and I've sent off a Mach5 job for it. I have read the code and it seems reasonable that this should work.
Multi-Reader Single-Writer mutex
Sorry to bikeshed but such a thing is called a ReadWriteLock or a ReadersWriterLock or something similar - can we please use familiar nomenclature here (even rwlock as per POSIX). "mutex" is a contraction of mutual-exclusion and as readers are not mutually exclusive it is an oxymoron to talk about a "multi-reader mutex". Thanks.
For what it's worth, C++ Standard Library uses "mutex" to refer to an object
that gets locked, and uses "lock" to refer to an RAII object that locks a
"mutex". So it has, for example, std::shared_mutex
, which is
multi-reader/single-writer. The idea being that a "mutex" provides some kind
of mutual exclusion - in the case of shared_mutex readers (multiple) are
excluded from any writer, and writers are exclusive from each other. I'd
prefer that to calling these new things "locks".
HotSpot Mutex has a rank, used (in debug builds) to detect potential deadlock situations. These new mutexes don't have that feature, and don't interact with Mutex in that way. The rank system has pros and cons, but I'm not sure just ignoring here is a good thing.
What is the source of the algorithm being used here? Is there a correctness proof? How does it compare with, for example, Terekhov's algorithm?
HotSpot Mutex has a rank,
True, but I would put the read/write lock in the same basket as Semaphore, being completely independent of Mutex/Monitor. The onus is on the user to use them with care. A shared ranking mechanism would be tricky I think, and read locks don't fit well with ranking schemes.
How does it compare with, for example, Terekhov's algorithm?
I'm not familiar with this algorithm, do you have a link?
What is the source of the algorithm being used here?
I'll have to defer to @fisk for this question.
For what it's worth, C++ Standard Library uses "mutex" to refer to an object that gets locked, and uses "lock" to refer to an RAII object that locks a "mutex".
So their "lock" would be our MutexLocker? The far more dominant terminology is that a mutex is a lock.
How does it compare with, for example, Terekhov's algorithm?
I'm not familiar with this algorithm, do you have a link?
One place I found it. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2406.html#shared_mutex search for "shared_mutex Reference Implementation"
What's proposed by this change has some similarities. This moves some state manipulation out from under the lock by using atomic operations on the state. It only has one condvar, possibly due to the limitations of our portability layer. Terekhov's algorithm is fair in the sense that neither readers nor writers get preferential treatment; it defers to the OS to decide scheduling.
For what it's worth, C++ Standard Library uses "mutex" to refer to an object that gets locked, and uses "lock" to refer to an RAII object that locks a "mutex".
So their "lock" would be our MutexLocker? The far more dominant terminology is that a mutex is a lock.
Depends on which community. I think the C++ convention makes a lot of sense, esp. given the RAII approach to locking.
HotSpot Mutex has a rank,
True, but I would put the read/write lock in the same basket as Semaphore, being completely independent of Mutex/Monitor. The onus is on the user to use them with care. A shared ranking mechanism would be tricky I think, and read locks don't fit well with ranking schemes.
I would not put them in the same basket as semaphore. Semaphores are a signaling mechanism, with no concept of scope or (shared) ownership (though semaphores sometimes get conscripted into the implementation of such mechanisms).
I don't see any interesting difference between an unshared locking of a shared_mutex and locking an ordinary mutex. If ranks (or anything of a similar nature) make sense for the latter, then they make sense for the former and ought to be the same mechanism.
I agree that read locks with ranking could have problems. Though I wonder if "use with care" might solve most of those problems.
For some reason, I can't believe that we don't have this feature already in the vm, or at least, I was expecting this to be a simple wrapper over globalCounter.hpp. Or SingleWriterSynchronizer.hpp (whose name I can say). If it's the same as the ZGC version, maybe the ZGC version should be moved or reconciled in the utilities directory.
GlobalCounter and SingleWriterSynchronizer are RCU mechanisms. While there are some similarities in use-cases, RCU is not equivalent to a shared mutex. I think there are use-cases for shared mutex that aren't appropriate (or even feasible?) for RCU. My impression is this came up because UL has its own attempt at an RCU mechanism that is broken in both implementation and usage. (That's my 2nd-hand impression; I've not really looked at it.) One solution would be to change UL to use one of our other RCU mechanisms and fix any usage problems. Another is to introduce a shared mutex and use that. Generational ZGC has a shared mutex to deal with a specific use-case, and this PR is proposing to generalize that.
What's proposed by this change has some similarities. This moves some state manipulation out from under the lock by using atomic operations on the state. It only has one condvar, possibly due to the limitations of our portability layer. Terekhov's algorithm is fair in the sense that neither readers nor writers get preferential treatment; it defers to the OS to decide scheduling.
Yeah, these are very similar solutions, seems like the major difference is the granularity of information provided by a condvar, locking instead of using atomics, and bit twiddling instead of just using the sign of a signed number.
I just pushed my first clean up of this code based on your comments.
Here's a short summary:
- Rename to ReadWriteLock
- Move to utilities
- Remove template, use a conditional branch + lambda for ThreadBlockInVM (painful way of doing it, stefank has a good idea with the nullptr strategy)
- Clarify some comments
- Remove nullptr case from Locker, as it's not used in this code anyway
- Use int32_t instead of int64_t for counter
Some of your comments are still not handled. Off the top of my head, I'm still not documenting everything that dholmes is interested in. @robehn has mentioned to me that there are issues when a safepointed thread holds a lock that the VM thread wants to hold, so that needs to be handled (he has also pointed out how to handle this).
Kim mentioned that UL might want to use GlobalCounter instead. This is viable, but contention on the global counter may lead to performance degradation for other users, AFAIK.
Finally, I will be simplifying this PR by pulling out the additions to threadHelper.inline.hpp into its own PR.
Thank you for the additional review, @dholmes-ora . I fixed those and added some more documentation.
The changes to threadHelper.inline.hpp
is being merged through a separate patch (see JDK-8292679, https://github.com/openjdk/jdk/pull/9962), and so are not in need of review.
I don't have time to review, just chiming in that I look forward to use RWL in NMT, to simplify the lockless MST.
@tstuefe I have a memory of there being a lock like this in NMT, but it must have been removed.
If there's an upcoming use of this RWL lock, then we should integrate it first and not bundle it with the client code.
@tstuefe I have a memory of there being a lock like this in NMT, but it must have been removed.
There is a lock for queries. MST is lockless for normal operations, which makes the table a bit cumbersome and not ideally sorted (bucket chains sorted reverse from what would be good). Normally this is not a big problem since those chains are short.
If there's an upcoming use of this RWL lock, then we should integrate it first and not bundle it with the client code.
Certainly. I was not planning to do it right away, just voicing my support for this RWL idea.
I have to look again at this anyway, maybe I'm overlooking something. E.g. I'm not sure if the MST needs to be functional beyond VMs normal lifetime. Then it is better to leave it lockless.
@jdksjolen Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.
As no current review comments were alive I rebased onto master and force pushed. This reduces the PR a bit, as parts of it (additions to threadHelper.inline.hpp
) was merged in another PR and as such could be removed.
@jdksjolen please do not ever force push in a live PR. There is no need, you can just do your merge locally and then push the merge changesets.
Latest commit addresses most of Kim's review comments. It also makes the current
thread optional, allowing it to be null. This is mainly useful if the lock is used very early in initialization.
I don't expect this to be a popular opinion, but the more I think about it the less I like the direct use of PlatformMonitor here rather than using Monitor. The "extra cruft" that comes with Monitor is there for reasons, and I'm not convinced that bypassing all that is good.