rust icon indicating copy to clipboard operation
rust copied to clipboard

Add `optimize_for_size` variants for stable and unstable sort as well as select_nth_unstable

Open Voultapher opened this issue 1 year ago • 6 comments

  • Stable sort uses a simple merge-sort that re-uses the existing - rather gnarly - merge function.
  • Unstable sort jumps directly to the branchless heapsort fallback.
  • select_nth_unstable jumps directly to the median_of_medians fallback, which is augmented with a custom tiny smallsort and partition impl.

Some code is duplicated but de-duplication would bring it's own problems. For example swap_if_less is critical for performance, if the sorting networks don't inline it perf drops drastically, however #[inline(always)] is also a poor fit, if the provided comparison function is huge, it gives the compiler an out to only instantiate swap_if_less once and call it. Another aspect that would suffer when making swap_if_less pub, is having to cfg out dozens of functions in in smallsort module.

Part of https://github.com/rust-lang/rust/issues/125612

r​? @Kobzol

Voultapher avatar Aug 25 '24 18:08 Voultapher

r? @scottmcm

rustbot has assigned @scottmcm. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot avatar Aug 25 '24 18:08 rustbot

r? @Kobzol

Voultapher avatar Aug 25 '24 18:08 Voultapher

Could not assign reviewer from: Kobzol. User(s) Kobzol are either the PR author, already assigned, or on vacation, and there are no other candidates. Use r? to specify someone else to assign.

rustbot avatar Aug 25 '24 18:08 rustbot

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
#16 2.865 Building wheels for collected packages: reuse
#16 2.866   Building wheel for reuse (pyproject.toml): started
#16 3.117   Building wheel for reuse (pyproject.toml): finished with status 'done'
#16 3.118   Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132715 sha256=dfa09868353292d98f811d3efdb0d54d07389e808efc71d68e3b93c514bf8bec
#16 3.119   Stored in directory: /tmp/pip-ephem-wheel-cache-nq13luie/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#16 3.121 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#16 3.532 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#16 3.532 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#16 DONE 3.6s

rust-log-analyzer avatar Aug 25 '24 18:08 rust-log-analyzer

r? libs

Kobzol avatar Aug 25 '24 19:08 Kobzol

One thing that is still outstanding is, how do we document the different implementation behavior?

I think adding even more text to the 10 places that document the current implementation of sort, sort_unstable and select_nth_unstable isn't a good idea. Maybe there could be something in a specific Rust book that mentions the implications of using optimize_for_size.

That said I think with how recent and WIP optimize_for_size still is, I'd say it's not a blocker for this PR.

Voultapher avatar Aug 26 '24 07:08 Voultapher

I think the open points have been addressed, if not, please re-open. Good to be merged from my side.

Voultapher avatar Aug 29 '24 08:08 Voultapher

Would be nice to measure the binary-size impact of this change in a real setting, here are the result of our tool that we also used in the design documents:

slice::sort:

Configuration Type Size default (bytes) Size optimize_for_size (bytes)
release u64 5304 602
release String 6630 838
release_lto_thin u64 5317 570
release_lto_thin String 6607 806
release_lto_thin_opt_level_s u64 3748 584
release_lto_thin_opt_level_s String 5448 813

slice::sort_unstable:

Configuration Type Size default (bytes) Size optimize_for_size (bytes)
release u64 3308 182
release String 4968 488
release_lto_thin u64 3308 182
release_lto_thin String 4968 488
release_lto_thin_opt_level_s u64 3034 168
release_lto_thin_opt_level_s String 4281 410

slice::select_nth_unstable: index = v.len() / 2

Configuration Type Size default (bytes) Size optimize_for_size (bytes)
release u64 4331 3075
release String 6204 4002
release_lto_thin u64 4331 3075
release_lto_thin String 6204 4002
release_lto_thin_opt_level_s u64 3081 2032
release_lto_thin_opt_level_s String 5603 3697

Voultapher avatar Aug 29 '24 08:08 Voultapher

Pushed a bunch of changes, and updated the table above.

Voultapher avatar Sep 04 '24 18:09 Voultapher

Some perf results:

image

image

Voultapher avatar Sep 05 '24 11:09 Voultapher

LGTM, and thanks for the detailed tables and graphs as well!

@bors r+ rollup=never (just to be sure of independent perf)

cuviper avatar Sep 24 '24 17:09 cuviper

:pushpin: Commit 54391983486d5271c378821c8e6a8f018dfdd14d has been approved by cuviper

It is now in the queue for this repository.

bors avatar Sep 24 '24 17:09 bors

:hourglass: Testing commit 54391983486d5271c378821c8e6a8f018dfdd14d with merge 363ae4188316b8b22cf6c1890bc73d84d05f70a4...

bors avatar Sep 24 '24 18:09 bors

:sunny: Test successful - checks-actions Approved by: cuviper Pushing 363ae4188316b8b22cf6c1890bc73d84d05f70a4 to master...

bors avatar Sep 24 '24 21:09 bors

Finished benchmarking commit (363ae4188316b8b22cf6c1890bc73d84d05f70a4): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.3%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.2%, 0.3%] 3

Max RSS (memory usage)

Results (primary 5.7%, secondary 2.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
5.7% [4.0%, 9.1%] 3
Regressions ❌
(secondary)
2.6% [2.0%, 3.1%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 5.7% [4.0%, 9.1%] 3

Cycles

Results (primary -0.5%, secondary 3.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.8% [2.8%, 4.3%] 5
Improvements ✅
(primary)
-0.5% [-0.5%, -0.5%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.5% [-0.5%, -0.5%] 1

Binary size

Results (primary 0.0%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.8%] 14
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.0%] 21
Improvements ✅
(secondary)
-0.1% [-0.2%, -0.1%] 38
All ❌✅ (primary) 0.0% [-0.1%, 0.8%] 35

Bootstrap: 767.328s -> 768.432s (0.14%) Artifact size: 340.88 MiB -> 340.84 MiB (-0.01%)

rust-timer avatar Sep 24 '24 22:09 rust-timer

Not sure the rust-timer run tells us anything useful here, given that all changes here happened to code behind a libcore feature that isn't being built IIUC.

Voultapher avatar Sep 25 '24 13:09 Voultapher

I don't think that further investigation is needed. This has seemingly perturbed codegen a little bit, but otherwise it's hidden behind an optional flag. The image regression has flipped back right after.

@rustbot label: +perf-regression-triaged

Kobzol avatar Oct 01 '24 07:10 Kobzol