jdk
jdk copied to clipboard
8294693: Add Collections.shuffle overload that accepts RandomGenerator interface
Java 17 added RandomGenerator interface. However, existing method Collections.shuffle accepts old java.util.Random class. While since Java 19, it's possible to use Random.from(RandomGenerator) wrapper, it would be more convenient to provide direct overload shuffle(List<?> list, RandomGenerator rnd).
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [ ] Change requires a CSR request to be approved
Issues
- JDK-8294693: Add Collections.shuffle overload that accepts RandomGenerator interface
- JDK-8294694: Add Collections.shuffle overload that accepts RandomGenerator interface (CSR)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10520/head:pull/10520
$ git checkout pull/10520
Update a local copy of the PR:
$ git checkout pull/10520
$ git pull https://git.openjdk.org/jdk pull/10520/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 10520
View PR using the GUI difftool:
$ git pr show -t 10520
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10520.diff
:wave: Welcome back tvaleev! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
/csr
@amaembo an approved CSR request is already required for this pull request.
@amaembo The following label will be automatically applied to this pull request:
core-libs
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Webrevs
- 05: Full - Incremental (37a8c9f0)
- 04: Full - Incremental (a465bd79)
- 03: Full - Incremental (7b4486f8)
- 02: Full - Incremental (127e44ca)
- 01: Full - Incremental (6fa7d942)
- 00: Full (40a69fec)
See my comments in JDK-8218282. While updating the one-arg Collections::shuffle to use ThreadLocalRandom seems obvious, it's not clear to me that we actually want to do that.
@stuart-marks thank you for the information. As this part is controversial, and there's a separate issue for it, I'll remove it from my PR and edit the description, so it could be addressed separately.
Should this PR be held back by JEP: 431: Sequenced Collections?
Depending on the final API for SequencedCollection I would think it would be possible to shuffle the contents of a sequenced collection (accessed order set) if replaceAll(java.util.function.UnaryOperator) was added from List. Shuffle could be implemented like the current non-random access branch today but replace the ListIterator logic with replaceAll.
Waiting for that API to settle might save us from having to add another shuffle overload to Collections. Currently there is a workaround to shuffle with a RandomGenerator by using the wrapper which should remove some of the pressure to get this into a release quickly.
@jmehrens I don't think that replaceAll would work nicely with SequencedCollection. It will be definitely unsupported for sorted sets. For unsorted sets like LinkedHashSet, it's unclear how to behave if replaceAll returns identical elements. Throw an exception? Shrink the set size via deduplication? Also, there's no way to optimize the implementation better than clearing the set and inserting the results back, so you need to compute hash codes, distribute items to buckets, probably create tree nodes in case of collisions, etc. It would be simpler if replaceAll function was forced to return a permutation (like shuffle does), but it would also limit the usefulness severely.
I don't think that shuffling anything but List has a big value. Usually it's ok to copy into a fresh ArrayList and shuffle it instead. You won't get significant performance improvement if LinkedHashSet would be able to shuffle itself in-place. Some improvement for ArrayDeque would be possible but again, dumping it into a new ArrayList would solve the problem. Note also the proposal to provide a list view from ArrayDeque: https://bugs.openjdk.org/browse/JDK-8143850
To summarize, I have big doubts that generalizing shuffling to SequencedCollection interface would be useful.
For unsorted sets like LinkedHashSet, it's unclear how to behave if replaceAll returns identical elements. Throw an exception? Shrink the set size via deduplication?
I would assume the spec for replaceAll would be borrowed from Map.html#replaceAll(java.util.function.BiFunction). Shrinking size is not an option because that is a ConcurrentModificationException. Adding elements to the set would a ConcurrentModificationException. Therefore, LinkedHashSet::replaceAll would have to be swapping the order of the linked entries already contained in the set. So in your example of returning identical elements the answer is CCE (or IAE) if the element is not contained in the set. Otherwise, the element is eventually moved to the last entry in the set.
I'm having difficulty imagining how SequencedCollection::replaceAll could work or be useful. Obviously it's on List already. It could be added to Deque, because like List, its elements are explicitly positioned (positioning via the API, not by value) and more importantly, the elements are unconstrained. That is, an element could be replaced by any other value and not affect or be affected by any other elements in the collection.
However, LinkedHashSet is a Set and so has a unique-elements constraint. What should replaceAll do if the replacement element already exists in the set? (For a shuffle operation, the first swap operation by definition replaces an element with another element that's somewhere else in the set, so this issue arises immediately.) Maybe the unique-elements constraint is suspended for the duration of the call, and it's only enforced at the end of the call. Conceptually that might work, but I have trouble envisioning an implementation that can do that.
A replaceAll operation on a NavigableSet seems right out. The elements are positioned according to their sort order, so there's no notion of updating an element "in-place" at all.
The comparison to Map::replaceAll doesn't seem to help; that method replaces only the values of the map entries, which are unconstrained. All of the constraints of maps (uniqueness, sort order) are on the keys.
For these reasons I don't think it makes sense to couple this PR to JEP 431.
jdk.test.lib.RandomFactory can be used to generate a reproducible sequence of random numbers. An example of its use may be seen for example in java/nio/file/Files/CopyAndMove.java
I'm having difficulty imagining how SequencedCollection::replaceAll could work or be useful. Obviously it's on List already. In Collections.java, it seems that the cases of AbstractSequentialList are handled like so:
- Call toArray
- Perform some permutation on the array copy.
- Write results back to using a loop ListIterator::set.
Step three can always be rewritten with List::replaceAll (new RFE). SequencedCollection is somewhat like AbstractSequentialList because calls to get are not used. Therefore, if AbstractSequentialList has positional read access but it is not used and SequencedCollection doesn't have positional access either then the two types could be substituted if we had some sort of way to write the permutation back to the SequencedCollection. One way to do that is replaceAll as it is a sequenced (non-random) write operation.
However, LinkedHashSet is a Set and so has a unique-elements constraint. What should replaceAll do if the replacement element already exists in the set? (For a shuffle operation, the first swap operation by definition replaces an element with another element that's somewhere else in the set, so this issue arises immediately.)
On an SequencedSet it is effectively a swap operation. Ejecting the existing element would be a CCE and adding a new element would be a CCE so the only way to implement is with position swap. It just would mean the UnaryOperator could encounter the same element for the whole iteration if it was constantly swapped with (i + 1). Perhaps that is violation where this all falls apart?
A replaceAll operation on a NavigableSet seems right out. The elements are positioned according to their sort order, so there's no notion of updating an element "in-place" at all.
My understand is that addFirst/addLast would be present on NavigableSet per retrofitting section of the JEP. The replaceAll would have API wiggle words the same way addFirst/addLast would.
At any rate I don't want to derail this PR and I appreciate your response. It just seems odd to me that SequencedCollection has a sequence but we can't sort or shuffle that sequence via Collections.java as we do for AbstractSequentialList.
@amaembo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
A gentle ping: please review the change and the CSR. Thanks.
@amaembo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Sorry, this got derailed by the discussion of SequencedCollection::replaceAll. I think we should just set that issue aside. I don't think SequencedCollection::replaceAll makes sense, thus this PR shouldn't be held up on its account.
The change is fine in concept. A couple adjustments needs to be made: javadoc since-tags need to be updated to 21, and copyrights updated to 2023. (Well, maybe an argument could be made that they should stay at 2022. I don't particularly care.) Also, the CSR should be updated to reflect the changes to the specifications. Other than those items, this should be fine to move forward.
@amaembo a couple comments on the test.
The test should probably have @key randomness added to it.
On 2022-10-28, @bplb wrote:
jdk.test.lib.RandomFactory can be used to generate a reproducible sequence of random numbers. An example of its use may be seen for example in java/nio/file/Files/CopyAndMove.java
This bit of the test library is useful if the test is testing a random subset of the state space. It prints out the random seed on each run so that if one of the test cases fails, it's possible to reproduce it by supplying the same seed. However, it's restricted to Random and SplittableRandom, and we want to test something like Xoshiro256PlusPlus that is a RandomGenerator but not a Random. So maybe this test library can't be applied. However, take a look and see if you think it might be useful to use it.
@amaembo Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.
@stuart-marks thank you! I've updated PR (only since and copyright year) and CSR. Also rebased the change, though probably I should not? Hopefully, it's not very harmful.
Thanks for updates. I've tweaked the CSR a bit and I've marked it reviewed; please go ahead and move it to Finalized.
(On rebase vs. merge, merge is definitely preferable. Merges preserve PR history more faithfully. Merge commits as well as intermediate commits all get squashed when the PR is integrated, so the result is a single commit in the mainline repo. Rebasing isn't a big deal with a small change like this, but rebasing would probably confuse things if this PR were more complicated or if it had a lot of comments on different versions.)
Marking this Approved so it can be integrated after the CSR completes.
/integrate
@amaembo This pull request has not yet been marked as ready for integration.
/integrate
@amaembo This pull request has not yet been marked as ready for integration.
/csr
Hm... is something still missing?
@amaembo an approved CSR request is already required for this pull request.
The CSR still needs to be Approved.
(Yes, unfortunately, Finalized is not a terminal state. It means the writing of the proposal has been finalized and is ready to be evaluated for approval.)
@amaembo This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8294693: Add Collections.shuffle overload that accepts RandomGenerator interface
Reviewed-by: smarks
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 138 new commits pushed to the master branch:
- c8dd7583a92082bcd2a4dfd5429889e7f0a44050: 8300260: Remove metaprogramming/isSame.hpp
- a6c2a2ae79be6810dca55b13bfc8a7625f25d48d: 8300692: GCC 12 reports some compiler warnings in bundled freetype
- bb42e61a6176a7f4f9485efa47a248b23b09a16d: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
- 06394ee8b110fe8e37a3b9e582f5dfbf225a3d89: 8300590: [JVMCI] BytecodeFrame.equals is broken
- 5331a3ef739166b2a2b0871fc9615f2c99effa89: 8298908: Instrument Metaspace for ASan
- e1ee6727f70209cf046cafba109835ad4acc1c23: 8300725: Improve performance of ColorConvertOp for default destinations with alpha
- 7c2f77a42293eb79829fce99bfce82e89a5df6d7: 8300584: Accelerate AVX-512 CRC32C for small buffers
- 5784eb7b68a880e130fda5f07c527187764038a2: 8300721: Cleanup ProblemList-svc-vthread.txt
- 9d44dd0cca620ef8e16e0c4306e6e54d8de6d1e8: 8297972: Poly1305 Endianness on ByteBuffer not enforced
- facd41511b972e940ecab3bc57f5f23efca43343: 8297757: VarHandles.getStaticFieldFromBaseAndOffset should get the receiver type from VarHandle
- ... and 128 more: https://git.openjdk.org/jdk/compare/4be94e435002d9d6cb594a393e9e35d650a9a0c9...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.