jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8294693: Add Collections.shuffle overload that accepts RandomGenerator interface

Open amaembo opened this issue 3 years ago • 8 comments
trafficstars

Java 17 added RandomGenerator interface. However, existing method Collections.shuffle accepts old java.util.Random class. While since Java 19, it's possible to use Random.from(RandomGenerator) wrapper, it would be more convenient to provide direct overload shuffle(List<?> list, RandomGenerator rnd).


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [ ] Change requires a CSR request to be approved

Issues

  • JDK-8294693: Add Collections.shuffle overload that accepts RandomGenerator interface
  • JDK-8294694: Add Collections.shuffle overload that accepts RandomGenerator interface (CSR)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10520/head:pull/10520
$ git checkout pull/10520

Update a local copy of the PR:
$ git checkout pull/10520
$ git pull https://git.openjdk.org/jdk pull/10520/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10520

View PR using the GUI difftool:
$ git pr show -t 10520

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10520.diff

amaembo avatar Oct 01 '22 08:10 amaembo

:wave: Welcome back tvaleev! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Oct 01 '22 08:10 bridgekeeper[bot]

/csr

amaembo avatar Oct 01 '22 08:10 amaembo

@amaembo an approved CSR request is already required for this pull request.

openjdk[bot] avatar Oct 01 '22 08:10 openjdk[bot]

@amaembo The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Oct 01 '22 08:10 openjdk[bot]

See my comments in JDK-8218282. While updating the one-arg Collections::shuffle to use ThreadLocalRandom seems obvious, it's not clear to me that we actually want to do that.

stuart-marks avatar Oct 03 '22 21:10 stuart-marks

@stuart-marks thank you for the information. As this part is controversial, and there's a separate issue for it, I'll remove it from my PR and edit the description, so it could be addressed separately.

amaembo avatar Oct 08 '22 08:10 amaembo

Should this PR be held back by JEP: 431: Sequenced Collections?

Depending on the final API for SequencedCollection I would think it would be possible to shuffle the contents of a sequenced collection (accessed order set) if replaceAll(java.util.function.UnaryOperator) was added from List. Shuffle could be implemented like the current non-random access branch today but replace the ListIterator logic with replaceAll.

Waiting for that API to settle might save us from having to add another shuffle overload to Collections. Currently there is a workaround to shuffle with a RandomGenerator by using the wrapper which should remove some of the pressure to get this into a release quickly.

jmehrens avatar Oct 14 '22 02:10 jmehrens

@jmehrens I don't think that replaceAll would work nicely with SequencedCollection. It will be definitely unsupported for sorted sets. For unsorted sets like LinkedHashSet, it's unclear how to behave if replaceAll returns identical elements. Throw an exception? Shrink the set size via deduplication? Also, there's no way to optimize the implementation better than clearing the set and inserting the results back, so you need to compute hash codes, distribute items to buckets, probably create tree nodes in case of collisions, etc. It would be simpler if replaceAll function was forced to return a permutation (like shuffle does), but it would also limit the usefulness severely.

I don't think that shuffling anything but List has a big value. Usually it's ok to copy into a fresh ArrayList and shuffle it instead. You won't get significant performance improvement if LinkedHashSet would be able to shuffle itself in-place. Some improvement for ArrayDeque would be possible but again, dumping it into a new ArrayList would solve the problem. Note also the proposal to provide a list view from ArrayDeque: https://bugs.openjdk.org/browse/JDK-8143850

To summarize, I have big doubts that generalizing shuffling to SequencedCollection interface would be useful.

amaembo avatar Oct 15 '22 07:10 amaembo

For unsorted sets like LinkedHashSet, it's unclear how to behave if replaceAll returns identical elements. Throw an exception? Shrink the set size via deduplication?

I would assume the spec for replaceAll would be borrowed from Map.html#replaceAll(java.util.function.BiFunction). Shrinking size is not an option because that is a ConcurrentModificationException. Adding elements to the set would a ConcurrentModificationException. Therefore, LinkedHashSet::replaceAll would have to be swapping the order of the linked entries already contained in the set. So in your example of returning identical elements the answer is CCE (or IAE) if the element is not contained in the set. Otherwise, the element is eventually moved to the last entry in the set.

jmehrens avatar Oct 18 '22 01:10 jmehrens

I'm having difficulty imagining how SequencedCollection::replaceAll could work or be useful. Obviously it's on List already. It could be added to Deque, because like List, its elements are explicitly positioned (positioning via the API, not by value) and more importantly, the elements are unconstrained. That is, an element could be replaced by any other value and not affect or be affected by any other elements in the collection.

However, LinkedHashSet is a Set and so has a unique-elements constraint. What should replaceAll do if the replacement element already exists in the set? (For a shuffle operation, the first swap operation by definition replaces an element with another element that's somewhere else in the set, so this issue arises immediately.) Maybe the unique-elements constraint is suspended for the duration of the call, and it's only enforced at the end of the call. Conceptually that might work, but I have trouble envisioning an implementation that can do that.

A replaceAll operation on a NavigableSet seems right out. The elements are positioned according to their sort order, so there's no notion of updating an element "in-place" at all.

The comparison to Map::replaceAll doesn't seem to help; that method replaces only the values of the map entries, which are unconstrained. All of the constraints of maps (uniqueness, sort order) are on the keys.

For these reasons I don't think it makes sense to couple this PR to JEP 431.

stuart-marks avatar Oct 28 '22 23:10 stuart-marks

jdk.test.lib.RandomFactory can be used to generate a reproducible sequence of random numbers. An example of its use may be seen for example in java/nio/file/Files/CopyAndMove.java

bplb avatar Oct 29 '22 00:10 bplb

I'm having difficulty imagining how SequencedCollection::replaceAll could work or be useful. Obviously it's on List already. In Collections.java, it seems that the cases of AbstractSequentialList are handled like so:

  1. Call toArray
  2. Perform some permutation on the array copy.
  3. Write results back to using a loop ListIterator::set.

Step three can always be rewritten with List::replaceAll (new RFE). SequencedCollection is somewhat like AbstractSequentialList because calls to get are not used. Therefore, if AbstractSequentialList has positional read access but it is not used and SequencedCollection doesn't have positional access either then the two types could be substituted if we had some sort of way to write the permutation back to the SequencedCollection. One way to do that is replaceAll as it is a sequenced (non-random) write operation.

However, LinkedHashSet is a Set and so has a unique-elements constraint. What should replaceAll do if the replacement element already exists in the set? (For a shuffle operation, the first swap operation by definition replaces an element with another element that's somewhere else in the set, so this issue arises immediately.)

On an SequencedSet it is effectively a swap operation. Ejecting the existing element would be a CCE and adding a new element would be a CCE so the only way to implement is with position swap. It just would mean the UnaryOperator could encounter the same element for the whole iteration if it was constantly swapped with (i + 1). Perhaps that is violation where this all falls apart?

A replaceAll operation on a NavigableSet seems right out. The elements are positioned according to their sort order, so there's no notion of updating an element "in-place" at all.

My understand is that addFirst/addLast would be present on NavigableSet per retrofitting section of the JEP. The replaceAll would have API wiggle words the same way addFirst/addLast would.

At any rate I don't want to derail this PR and I appreciate your response. It just seems odd to me that SequencedCollection has a sequence but we can't sort or shuffle that sequence via Collections.java as we do for AbstractSequentialList.

jmehrens avatar Oct 31 '22 14:10 jmehrens

@amaembo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Nov 29 '22 02:11 bridgekeeper[bot]

A gentle ping: please review the change and the CSR. Thanks.

amaembo avatar Dec 03 '22 08:12 amaembo

@amaembo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Dec 31 '22 11:12 bridgekeeper[bot]

Sorry, this got derailed by the discussion of SequencedCollection::replaceAll. I think we should just set that issue aside. I don't think SequencedCollection::replaceAll makes sense, thus this PR shouldn't be held up on its account.

The change is fine in concept. A couple adjustments needs to be made: javadoc since-tags need to be updated to 21, and copyrights updated to 2023. (Well, maybe an argument could be made that they should stay at 2022. I don't particularly care.) Also, the CSR should be updated to reflect the changes to the specifications. Other than those items, this should be fine to move forward.

stuart-marks avatar Jan 04 '23 21:01 stuart-marks

@amaembo a couple comments on the test.

The test should probably have @key randomness added to it.

On 2022-10-28, @bplb wrote:

jdk.test.lib.RandomFactory can be used to generate a reproducible sequence of random numbers. An example of its use may be seen for example in java/nio/file/Files/CopyAndMove.java

This bit of the test library is useful if the test is testing a random subset of the state space. It prints out the random seed on each run so that if one of the test cases fails, it's possible to reproduce it by supplying the same seed. However, it's restricted to Random and SplittableRandom, and we want to test something like Xoshiro256PlusPlus that is a RandomGenerator but not a Random. So maybe this test library can't be applied. However, take a look and see if you think it might be useful to use it.

stuart-marks avatar Jan 04 '23 21:01 stuart-marks

@amaembo Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

openjdk[bot] avatar Jan 15 '23 07:01 openjdk[bot]

@stuart-marks thank you! I've updated PR (only since and copyright year) and CSR. Also rebased the change, though probably I should not? Hopefully, it's not very harmful.

amaembo avatar Jan 15 '23 07:01 amaembo

Thanks for updates. I've tweaked the CSR a bit and I've marked it reviewed; please go ahead and move it to Finalized.

(On rebase vs. merge, merge is definitely preferable. Merges preserve PR history more faithfully. Merge commits as well as intermediate commits all get squashed when the PR is integrated, so the result is a single commit in the mainline repo. Rebasing isn't a big deal with a small change like this, but rebasing would probably confuse things if this PR were more complicated or if it had a lot of comments on different versions.)

Marking this Approved so it can be integrated after the CSR completes.

stuart-marks avatar Jan 18 '23 18:01 stuart-marks

/integrate

amaembo avatar Jan 18 '23 21:01 amaembo

@amaembo This pull request has not yet been marked as ready for integration.

openjdk[bot] avatar Jan 18 '23 21:01 openjdk[bot]

/integrate

amaembo avatar Jan 18 '23 21:01 amaembo

@amaembo This pull request has not yet been marked as ready for integration.

openjdk[bot] avatar Jan 18 '23 21:01 openjdk[bot]

/csr

amaembo avatar Jan 18 '23 22:01 amaembo

Hm... is something still missing?

amaembo avatar Jan 18 '23 22:01 amaembo

@amaembo an approved CSR request is already required for this pull request.

openjdk[bot] avatar Jan 18 '23 22:01 openjdk[bot]

The CSR still needs to be Approved.

(Yes, unfortunately, Finalized is not a terminal state. It means the writing of the proposal has been finalized and is ready to be evaluated for approval.)

stuart-marks avatar Jan 18 '23 22:01 stuart-marks

@amaembo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8294693: Add Collections.shuffle overload that accepts RandomGenerator interface

Reviewed-by: smarks

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 138 new commits pushed to the master branch:

  • c8dd7583a92082bcd2a4dfd5429889e7f0a44050: 8300260: Remove metaprogramming/isSame.hpp
  • a6c2a2ae79be6810dca55b13bfc8a7625f25d48d: 8300692: GCC 12 reports some compiler warnings in bundled freetype
  • bb42e61a6176a7f4f9485efa47a248b23b09a16d: 8300493: Use ArraysSupport.vectorizedHashCode in j.u.zip.ZipCoder
  • 06394ee8b110fe8e37a3b9e582f5dfbf225a3d89: 8300590: [JVMCI] BytecodeFrame.equals is broken
  • 5331a3ef739166b2a2b0871fc9615f2c99effa89: 8298908: Instrument Metaspace for ASan
  • e1ee6727f70209cf046cafba109835ad4acc1c23: 8300725: Improve performance of ColorConvertOp for default destinations with alpha
  • 7c2f77a42293eb79829fce99bfce82e89a5df6d7: 8300584: Accelerate AVX-512 CRC32C for small buffers
  • 5784eb7b68a880e130fda5f07c527187764038a2: 8300721: Cleanup ProblemList-svc-vthread.txt
  • 9d44dd0cca620ef8e16e0c4306e6e54d8de6d1e8: 8297972: Poly1305 Endianness on ByteBuffer not enforced
  • facd41511b972e940ecab3bc57f5f23efca43343: 8297757: VarHandles.getStaticFieldFromBaseAndOffset should get the receiver type from VarHandle
  • ... and 128 more: https://git.openjdk.org/jdk/compare/4be94e435002d9d6cb594a393e9e35d650a9a0c9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Jan 21 '23 18:01 openjdk[bot]