jdk
jdk copied to clipboard
8325679: Optimize ArrayList subList sort
Somewhat surprisingly, ArrayList$Sublist.sort() is not specialized and will thus fall back to slower default method of List.sort() instead of sorting a range of the array in-place in its backing root ArrayList.
This doesn't change observable behavior, so haven't added tests, and tier1 tests still all pass except for test/jdk/java/util/Locale/LocaleProvidersFormat.java which also currently fails on master too on the machine I tested on.
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8325679: Optimize ArrayList subList sort (Enhancement - P3)
Reviewers
- @tsypanovs (no known openjdk.org user name / role)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17818/head:pull/17818
$ git checkout pull/17818
Update a local copy of the PR:
$ git checkout pull/17818
$ git pull https://git.openjdk.org/jdk.git pull/17818/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17818
View PR using the GUI difftool:
$ git pr show -t 17818
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17818.diff
Webrev
:wave: Welcome back attila! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@szegedi The following label will be automatically applied to this pull request:
core-libs
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Webrevs
- 04: Full - Incremental (ed50c3da)
- 03: Full - Incremental (a5537664)
- 02: Full - Incremental (5bfe2d12)
- 01: Full - Incremental (33770e1e)
- 00: Full (680cb10f)
Been there since day one?
@szegedi Oh interesting catch. Looks like a reasonable optimization. Is this covered by a test, or should a new test be added?
@JimLaskey The List.sort method was added in JDK 8, so not really day one. Prior to that one had to call Collections.sort which copied to a temp array, sorted, then copied back; this applied equally to full lists and sublists. Since JDK 8, ArrayList.sort on the full list did sorting in place but sorting a subList just inherited the default method (which uses the old copy-to-temp-and-sort technique). My guess is that nobody noticed this because sublists aren't used that much, and sorting of subarrays, while arguably necessary for completeness, probably aren't used all that much either.
@stuart-marks you're right that sorting of sublists is rare. I did run into a need for it a few months ago in my day job though, that's how I stumbled upon this. Granted, it was a performance optimization and not a correctness necessity. See, I have a large list of data I read from an external location. Some elements – in at most one identifiable but arbitrarily large stride in the large list – need some extra processing. The algorithm for this processing can be written to be more efficient if it can presume those elements are sorted.
I could still sort the entire list (as the ordering doesn't matter to the final receiver), using a sort that just compares irrelevant elements as equal, so they'd remain as large "already sorted" strides, and only apply the real sort on only the relevant elements:
static int fullListCompare(Data d1, Data d2) {
if (isRelevant(d1)) {
if (isRelevant(d2)) {
return sublistCompare(d1, d2);
} else {
return -1;
}
} else if (isRelevant(d2)) {
return 1;
}
return 0;
}
This'd also have the side effect of moving all the relevant entries to the beginning of the list, but while that's not a problem, it's not necessary either. For this reason, I prefer to scan the list for the stride, reify it as a sublist, and only run a sort with sublistCompare on it, and then also do the extra processing of the elements on the sublist too.
So yeah, it is rare this pops up, but it does happen :-)
I'm not sure if there's a test that already exercises sublist sorts. I did find what seems like a setup for a TCK test for ArrayList that seems like it is specifically creating a test suite for sublists too, but I haven't looked more closely into how is that supposed to work. I'm happy to write some tests specifically for ArrayList sublist sort if this TCK one doesn't exercise it.
My guess is that nobody noticed this because sublists aren't used that much, and sorting of subarrays, while arguably necessary for completeness, probably aren't used all that much either.
FWIW, CopyOnWriteArrayList.COWSubList overrides sort.
Just to clarify, my comments about "sorting sublists is rare" is a response to "has this been there since day one?" from @JimLaskey, and it's not an argument against adding this. Maybe it's a bit surprising that it hasn't been done, but maybe not, as the case is somewhat rare. (There are a lot of cases in the collections where various collection views inherit slow implementations from default methods or from the Abstract implementations, because nobody has really cared enough to provide optimized implementations, and it would add bulk to the code.)
CopyOnWriteArrayList needs to override subList.sort since it potentially modifies the array, and COWAL needs to make a copy before making any modifications.
Regarding testing: the tests in test/jdk/java/util/concurrent/tck are supposed to be the conformance tests for the java.util.concurrent package, so it'd be somewhat odd to have functional tests for non-juc classes added there. (I did find it surprising that there are ArrayList tests there, but a lot of work in the core collections was done in the context of JSR166, so maybe I shouldn't be too surprised.)
There appears to be some coverage of the List.sort() default method in `test/jdk/java/util/List/ListDefault.java'. It does seem to cover sublists. I guess the question is whether this test is focused on the default implementation of List.sort(), or whether it's intended to cover all implementations -- whether the default or overridden -- of the default methods that were added in JDK 8. I think this area is worth a closer look. If the new ArrayList.subList().sort() override is covered already, we might be able to get away without adding a new test. Or possibly some adjustment could be made to this test if necessary.
CopyOnWriteArrayList needs to override subList.sort since it potentially modifies the array, and COWAL needs to make a copy before making any modifications.
I admit, I didn't look into that much; I just skimmed through some overrides of List.sort. That said, does it mean that before List.sort, Collections.sort, which had the same implementation that the default List.sort now has, would not be able to be properly sort CopyOnWriteArrayList?
(Sorry, if I digress.)
(Discussion mainly of historical interest.)
@pavelrappo Correct, historically, Collections.sort would fail to sort CopyOnWriteArrayList. You have to go back to JDK 7 to see this. The sorting approach used by Collections.sort (still present in the default implementation of List.sort) gets an array using toArray(), sorts it, and then copies the sorted elements back using ListIterator.set. Since CopyOnWriteArrayList doesn't support modifications using ListIterator, it fails with UnsupportedOperationException. The overrides of List.sort have fixed this problem.
COWAL still has some problems with other things that use similar techniques, such as Collections.shuffle. That uses get/set to swap elements, which COWAL does support, but it copies the array on each set() operation. This results in N copies of the array being made for an N-element list.
@stuart-marks > Just to clarify, my comments about "sorting sublists is rare" is a response to "has this been there since day one?" from @JimLaskey, and it's not an argument against adding this.
That's absolutely understood :-) I was just trying to add some color by describing the use case I stumbled upon.
There appears to be some coverage of the List.sort() default method in `test/jdk/java/util/List/ListDefault.java'. It does seem to cover sublists. I guess the question is whether this test is focused on the default implementation of List.sort(), or whether it's intended to cover all implementations -- whether the default or overridden -- of the default methods that were added in JDK 8. I think this area is worth a closer look. If the new ArrayList.subList().sort() override is covered already, we might be able to get away without adding a new test. Or possibly some adjustment could be made to this test if necessary.
I confirmed that the code in ListDefaults.java:403 exercises ArrayList$SubList.sort(). In the meantime, I also added a more exhaustive stochastic test, although it might be an overkill. I'm inclined to remove it.
@szegedi Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.
@szegedi This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8325679: Optimize ArrayList subList sort
Reviewed-by: liach
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 2538 new commits pushed to the master branch:
- e25a9e7fd86e4eaf020e54021efaa7059dc654c9: 8339486: JFR: Modernize
- 4e2dde2f0d6f96d5f07020d2417189f411c4596a: 8339371: jlink.log warning when building after JDK-8338404
- 7ad61605f1669f51a97f4f263a7afaa9ab7706be: 8339163: ZGC: Race in clearing of remembered sets
- a61860511f67038962c54e114599948ca103dae8: 8339399: ZGC: Remove unnecessary page reset when splitting pages
- f2c992c5af021ab0ff8429fd261314bc7e01f7df: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms
- 9a1024dec68057c7c581ac0a38fc7f96489a0a76: 8190329: [macos] Swing InterOp Platform.exit() crash
- 5998f4b6f53769f98188ae8c23ea49cc1f7aa802: 8308588: Unnecessary synchronized on GTKStyle#ICONS_MAP can be removed
- 90f3f4325772773f1dc1814c56d7326d5389e2c7: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths
- bbb516163d400a9c7e923e423fe2a60091b59322: 8337664: Distrust TLS server certificates issued after Oct 2024 and anchored by Entrust Root CAs
- a22e932ab838762a013fc25b8061165be93feeb3: 8337163: Improve SA error message when failing to attach to a core file
- ... and 2528 more: https://git.openjdk.org/jdk/compare/2ed889b7f217a7a21edee317d93b9b533edde578...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.
@szegedi This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Hey folks (@stuart-marks or @JimLaskey) it's been a while – can I get a review?
@szegedi This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Keep it open, please.
@szegedi This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Keep it open, please.
@szegedi This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@szegedi
Hi Attila, sorry for the long delay on this. If you update the PR per @liach's comment on 2024-07-09, I can sponsor it. Thanks.
/integrate
Going to push as commit c7d15f1fe09e61c1e61ee253e7e3df4c2b9306a1.
Since your change was applied there have been 2539 commits pushed to the master branch:
- bd8569bc6cc888cbf514e9329e2c24a059d89711: 8339131: Remove rarely-used accessor methods from Opcode
- e25a9e7fd86e4eaf020e54021efaa7059dc654c9: 8339486: JFR: Modernize
- 4e2dde2f0d6f96d5f07020d2417189f411c4596a: 8339371: jlink.log warning when building after JDK-8338404
- 7ad61605f1669f51a97f4f263a7afaa9ab7706be: 8339163: ZGC: Race in clearing of remembered sets
- a61860511f67038962c54e114599948ca103dae8: 8339399: ZGC: Remove unnecessary page reset when splitting pages
- f2c992c5af021ab0ff8429fd261314bc7e01f7df: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms
- 9a1024dec68057c7c581ac0a38fc7f96489a0a76: 8190329: [macos] Swing InterOp Platform.exit() crash
- 5998f4b6f53769f98188ae8c23ea49cc1f7aa802: 8308588: Unnecessary synchronized on GTKStyle#ICONS_MAP can be removed
- 90f3f4325772773f1dc1814c56d7326d5389e2c7: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths
- bbb516163d400a9c7e923e423fe2a60091b59322: 8337664: Distrust TLS server certificates issued after Oct 2024 and anchored by Entrust Root CAs
- ... and 2529 more: https://git.openjdk.org/jdk/compare/2ed889b7f217a7a21edee317d93b9b533edde578...master
Your commit was automatically rebased without conflicts.
@szegedi Pushed as commit c7d15f1fe09e61c1e61ee253e7e3df4c2b9306a1.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.