rust
rust copied to clipboard
Add `optimize_for_size` variants for stable and unstable sort as well as select_nth_unstable
- Stable sort uses a simple merge-sort that re-uses the existing - rather gnarly - merge function.
- Unstable sort jumps directly to the branchless heapsort fallback.
- select_nth_unstable jumps directly to the median_of_medians fallback, which is augmented with a custom tiny smallsort and partition impl.
Some code is duplicated but de-duplication would bring it's own problems. For example swap_if_less is critical for performance, if the sorting networks don't inline it perf drops drastically, however #[inline(always)] is also a poor fit, if the provided comparison function is huge, it gives the compiler an out to only instantiate swap_if_less once and call it. Another aspect that would suffer when making swap_if_less pub, is having to cfg out dozens of functions in in smallsort module.
Part of https://github.com/rust-lang/rust/issues/125612
r? @Kobzol
r? @scottmcm
rustbot has assigned @scottmcm. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r? to explicitly pick a reviewer
r? @Kobzol
Could not assign reviewer from: Kobzol.
User(s) Kobzol are either the PR author, already assigned, or on vacation, and there are no other candidates.
Use r? to specify someone else to assign.
The job mingw-check failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
#16 2.865 Building wheels for collected packages: reuse
#16 2.866 Building wheel for reuse (pyproject.toml): started
#16 3.117 Building wheel for reuse (pyproject.toml): finished with status 'done'
#16 3.118 Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132715 sha256=dfa09868353292d98f811d3efdb0d54d07389e808efc71d68e3b93c514bf8bec
#16 3.119 Stored in directory: /tmp/pip-ephem-wheel-cache-nq13luie/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#16 3.121 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#16 3.532 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#16 3.532 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#16 DONE 3.6s
r? libs
One thing that is still outstanding is, how do we document the different implementation behavior?
I think adding even more text to the 10 places that document the current implementation of sort, sort_unstable and select_nth_unstable isn't a good idea. Maybe there could be something in a specific Rust book that mentions the implications of using optimize_for_size.
That said I think with how recent and WIP optimize_for_size still is, I'd say it's not a blocker for this PR.
I think the open points have been addressed, if not, please re-open. Good to be merged from my side.
Would be nice to measure the binary-size impact of this change in a real setting, here are the result of our tool that we also used in the design documents:
slice::sort:
| Configuration | Type | Size default (bytes) | Size optimize_for_size (bytes) |
|---|---|---|---|
| release | u64 |
5304 | 602 |
| release | String |
6630 | 838 |
| release_lto_thin | u64 |
5317 | 570 |
| release_lto_thin | String |
6607 | 806 |
| release_lto_thin_opt_level_s | u64 |
3748 | 584 |
| release_lto_thin_opt_level_s | String |
5448 | 813 |
slice::sort_unstable:
| Configuration | Type | Size default (bytes) | Size optimize_for_size (bytes) |
|---|---|---|---|
| release | u64 |
3308 | 182 |
| release | String |
4968 | 488 |
| release_lto_thin | u64 |
3308 | 182 |
| release_lto_thin | String |
4968 | 488 |
| release_lto_thin_opt_level_s | u64 |
3034 | 168 |
| release_lto_thin_opt_level_s | String |
4281 | 410 |
slice::select_nth_unstable: index = v.len() / 2
| Configuration | Type | Size default (bytes) | Size optimize_for_size (bytes) |
|---|---|---|---|
| release | u64 |
4331 | 3075 |
| release | String |
6204 | 4002 |
| release_lto_thin | u64 |
4331 | 3075 |
| release_lto_thin | String |
6204 | 4002 |
| release_lto_thin_opt_level_s | u64 |
3081 | 2032 |
| release_lto_thin_opt_level_s | String |
5603 | 3697 |
Pushed a bunch of changes, and updated the table above.
Some perf results:
LGTM, and thanks for the detailed tables and graphs as well!
@bors r+ rollup=never (just to be sure of independent perf)
:pushpin: Commit 54391983486d5271c378821c8e6a8f018dfdd14d has been approved by cuviper
It is now in the queue for this repository.
:hourglass: Testing commit 54391983486d5271c378821c8e6a8f018dfdd14d with merge 363ae4188316b8b22cf6c1890bc73d84d05f70a4...
:sunny: Test successful - checks-actions Approved by: cuviper Pushing 363ae4188316b8b22cf6c1890bc73d84d05f70a4 to master...
Finished benchmarking commit (363ae4188316b8b22cf6c1890bc73d84d05f70a4): comparison URL.
Overall result: ❌ regressions - ACTION NEEDED
Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.
@rustbot label: +perf-regression cc @rust-lang/wg-compiler-performance
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
0.3% | [0.2%, 0.3%] | 3 |
| Regressions ❌ (secondary) |
- | - | 0 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | 0.3% | [0.2%, 0.3%] | 3 |
Max RSS (memory usage)
Results (primary 5.7%, secondary 2.6%)
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
5.7% | [4.0%, 9.1%] | 3 |
| Regressions ❌ (secondary) |
2.6% | [2.0%, 3.1%] | 3 |
| Improvements ✅ (primary) |
- | - | 0 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | 5.7% | [4.0%, 9.1%] | 3 |
Cycles
Results (primary -0.5%, secondary 3.8%)
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
- | - | 0 |
| Regressions ❌ (secondary) |
3.8% | [2.8%, 4.3%] | 5 |
| Improvements ✅ (primary) |
-0.5% | [-0.5%, -0.5%] | 1 |
| Improvements ✅ (secondary) |
- | - | 0 |
| All ❌✅ (primary) | -0.5% | [-0.5%, -0.5%] | 1 |
Binary size
Results (primary 0.0%, secondary -0.1%)
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean | range | count | |
|---|---|---|---|
| Regressions ❌ (primary) |
0.2% | [0.0%, 0.8%] | 14 |
| Regressions ❌ (secondary) |
- | - | 0 |
| Improvements ✅ (primary) |
-0.1% | [-0.1%, -0.0%] | 21 |
| Improvements ✅ (secondary) |
-0.1% | [-0.2%, -0.1%] | 38 |
| All ❌✅ (primary) | 0.0% | [-0.1%, 0.8%] | 35 |
Bootstrap: 767.328s -> 768.432s (0.14%) Artifact size: 340.88 MiB -> 340.84 MiB (-0.01%)
Not sure the rust-timer run tells us anything useful here, given that all changes here happened to code behind a libcore feature that isn't being built IIUC.
I don't think that further investigation is needed. This has seemingly perturbed codegen a little bit, but otherwise it's hidden behind an optional flag. The image regression has flipped back right after.
@rustbot label: +perf-regression-triaged