rust icon indicating copy to clipboard operation
rust copied to clipboard

internal iteration for `&mut I`

Open sarah-quinones opened this issue 3 years ago β€’ 11 comments

this pr implements internal iteration for &mut I when I: Sized. it additionally inlines some wrapper functions that were not previously inline, which seems to speed things up by a fair amount in some cases.

this lead to up to 3x performance gains across the board for iter:: benches, with only a minor regression for iter::bench_filter_sum

sarah-quinones avatar Aug 05 '22 16:08 sarah-quinones

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

rustbot avatar Aug 05 '22 16:08 rustbot

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon.

Please see the contribution instructions for more information.

rust-highfive avatar Aug 05 '22 16:08 rust-highfive

@bors try @rust-timer queue

scottmcm avatar Aug 05 '22 18:08 scottmcm

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Aug 05 '22 18:08 rust-timer

:hourglass: Trying commit cb7f7ee07f6d67e36cc08b337c034f5e81343dc1 with merge 3e685715a7ece536b2ab653e3433c06c00454bdf...

bors avatar Aug 05 '22 18:08 bors

:sunny: Try build successful - checks-actions Build commit: 3e685715a7ece536b2ab653e3433c06c00454bdf (3e685715a7ece536b2ab653e3433c06c00454bdf)

bors avatar Aug 05 '22 20:08 bors

Queued 3e685715a7ece536b2ab653e3433c06c00454bdf with parent d77da9da84fc89908ad01578c33c2dca8f597ffe, future comparison URL.

rust-timer avatar Aug 05 '22 20:08 rust-timer

Finished benchmarking commit (3e685715a7ece536b2ab653e3433c06c00454bdf): comparison url.

Instruction count

  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean[^1] max count[^2]
Regressions 😿
(primary)
0.8% 25.3% 66
Regressions 😿
(secondary)
0.6% 1.9% 32
Improvements πŸŽ‰
(primary)
-0.5% -1.5% 14
Improvements πŸŽ‰
(secondary)
-0.6% -1.4% 22
All πŸ˜ΏπŸŽ‰ (primary) 0.6% 25.3% 80

Max RSS (memory usage)

Results
  • Primary benchmarks: πŸŽ‰ relevant improvement found
  • Secondary benchmarks: mixed results
mean[^1] max count[^2]
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
4.0% 4.0% 1
Improvements πŸŽ‰
(primary)
-2.3% -2.3% 1
Improvements πŸŽ‰
(secondary)
-2.5% -2.5% 1
All πŸ˜ΏπŸŽ‰ (primary) -2.3% -2.3% 1

Cycles

Results
  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: 😿 relevant regressions found
mean[^1] max count[^2]
Regressions 😿
(primary)
14.7% 37.5% 3
Regressions 😿
(secondary)
2.8% 3.0% 2
Improvements πŸŽ‰
(primary)
N/A N/A 0
Improvements πŸŽ‰
(secondary)
N/A N/A 0
All πŸ˜ΏπŸŽ‰ (primary) 14.7% 37.5% 3

[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

rust-timer avatar Aug 05 '22 22:08 rust-timer

Wow, this totally changes the pattern of LTO in clap: image

scottmcm avatar Aug 05 '22 22:08 scottmcm

well, these don't look like the most promising results ^^'

sarah-quinones avatar Aug 05 '22 22:08 sarah-quinones

Presumably this makes compilation slower, but the perf tests don't show the effect on the performance of the compiled code, right?

compiler-errors avatar Aug 06 '22 00:08 compiler-errors

are there tests that do?

sarah-quinones avatar Aug 06 '22 09:08 sarah-quinones

Other than the std benches (which aren't great) we don't have anything automated to assess runtime performance. In the rustc-perf suite check and doc builds are the closest since they don't codegen but they're probably not diverse enough.

You could try paring down the PR by splitting out some of the changes. E.g. some of the inlining in function.rs doesn't look relevant to iterators. You can also run rustc-perf locally and focus on that one benchmark, that should yield results more quickly (assuming you have a machine that can compile a stage1 rustc in a reasonable amount of time).

the8472 avatar Aug 06 '22 10:08 the8472

rustc-perf seems to take forever on my machine and i can't display the results after it's finished. so that doesn't seem like a good option for me :/

sarah-quinones avatar Aug 06 '22 18:08 sarah-quinones

It can be set to run a subset of the benchmarks, e.g. the serde ones. https://github.com/rust-lang/rustc-perf/tree/master/collector#benchmarking-options Running the site locally should work as long as it uses the same DB as generated by the collector.

the8472 avatar Aug 06 '22 18:08 the8472

thanks for the tips! i managed to get it working thanks to your help. it seems that the biggest culprit was inlining the ops::function wrappers.
but even without it i still get a 1-2% regression on deeply-nested-multi

sarah-quinones avatar Aug 06 '22 21:08 sarah-quinones

I'm going to send this over to

r? @m-ou-se

because I think this is going to be as much a policy decision (about compile-vs-runtime) as it is about the code itself.

scottmcm avatar Aug 11 '22 19:08 scottmcm

Some changes were reverted, let's get new perf results.

@bors try @rust-timer queue

the8472 avatar Aug 11 '22 19:08 the8472

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Aug 11 '22 19:08 rust-timer

:hourglass: Trying commit f6a3462bda206ae937ff83760bf0d359fb2ddf0e with merge c20ee6d211784a78b94e26d37cce4e66acea976a...

bors avatar Aug 11 '22 19:08 bors

:sunny: Try build successful - checks-actions Build commit: c20ee6d211784a78b94e26d37cce4e66acea976a (c20ee6d211784a78b94e26d37cce4e66acea976a)

bors avatar Aug 11 '22 20:08 bors

Queued c20ee6d211784a78b94e26d37cce4e66acea976a with parent aeb5067967ef58e4a324b19dd0dba2f385d5959f, future comparison URL.

rust-timer avatar Aug 11 '22 20:08 rust-timer

Finished benchmarking commit (c20ee6d211784a78b94e26d37cce4e66acea976a): comparison url.

Instruction count

  • Primary benchmarks: mixed results
  • Secondary benchmarks: ❌ relevant regressions found
mean[^1] max count[^2]
Regressions ❌
(primary)
0.2% 0.3% 14
Regressions ❌
(secondary)
0.7% 2.0% 16
Improvements βœ…
(primary)
-0.4% -0.7% 7
Improvements βœ…
(secondary)
- - 0
All βŒβœ… (primary) 0.0% -0.7% 21

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: mixed results
mean[^1] max count[^2]
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.4% 4.5% 8
Improvements βœ…
(primary)
- - 0
Improvements βœ…
(secondary)
-3.4% -4.2% 3
All βŒβœ… (primary) - - 0

Cycles

Results
  • Primary benchmarks: βœ… relevant improvement found
  • Secondary benchmarks: βœ… relevant improvement found
mean[^1] max count[^2]
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements βœ…
(primary)
-2.3% -2.3% 1
Improvements βœ…
(secondary)
-4.1% -4.1% 1
All βŒβœ… (primary) -2.3% -2.3% 1

[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

rust-timer avatar Aug 11 '22 23:08 rust-timer

r? @the8472

m-ou-se avatar Dec 30 '22 13:12 m-ou-se

The compile-time perf numbers are slightly negative, but less so than the previous attempt to do this.

But we need some runtime benchmark numbers to verify that it brings the expected benefits. There are some core::iter benchmarks that I'd expect to show some speedup.

@rustbot author

the8472 avatar Dec 30 '22 17:12 the8472

@sarah-ek any updates on this?

Dylan-DPC avatar Jan 23 '23 12:01 Dylan-DPC

Closing this as inactive. Feel free to reΓΆpen this pr or create a new pr if you get the time to work on this. Thanks

Dylan-DPC avatar May 16 '23 12:05 Dylan-DPC