rust icon indicating copy to clipboard operation
rust copied to clipboard

Optimize `array::IntoIter`

Open scottmcm opened this issue 3 years ago • 14 comments

.into_iter() on arrays was slower than it needed to be (especially compared to slice iterator) since it uses Range<usize>, which needs to handle degenerate ranges like 10..4.

This PR adds an internal IndexRange type that's like Range<usize> but with a safety invariant that means it doesn't need to worry about those cases -- it only handles start <= end -- and thus can give LLVM more information to optimize better.

I added one simple demonstration of the improvement as a codegen test.

(vec::IntoIter uses pointers instead of indexes, so doesn't have this problem, but that only works because its elements are boxed. array::IntoIter can't use pointers because that would keep it from being movable.)

scottmcm avatar Aug 06 '22 23:08 scottmcm

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

rustbot avatar Aug 06 '22 23:08 rustbot

r? @thomcc

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive avatar Aug 06 '22 23:08 rust-highfive

This regresses the vec::bench_flat_map_collect benchmark by a lot for me, over 10x.

When looking into this I noticed that array::IntoIter::fold calls fold on iter::ByRefSized(&mut self.alive), which as far as I can see doesn't call the actual fold implementation of IndexRange because ByRefSized would have to pass the iterator by value, which it can't. So using ByRefSized seems to be pointless here. However, it doesn't look like this is what's causing the slowdown.

timvermeulen avatar Aug 07 '22 02:08 timvermeulen

Thanks, @timvermeulen, I'll look at that one.

I've sent PR #100220 to fix ByRefSized::fold, and will consider this PR blocked until that one goes in and I can confirm it solves the problem you raised.

@rustbot author

Update 2022-08-24: I've rebased atop the other PR, but still not investigated that perf test, so it's not really review-ready yet.

scottmcm avatar Aug 07 '22 03:08 scottmcm

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
   Compiling memchr v2.5.0
   Compiling std v0.0.0 (/checkout/library/std)
   Compiling compiler_builtins v0.1.79
   Compiling unwind v0.0.0 (/checkout/library/unwind)
error: no rules expected the token `start`
    --> library/core/src/ops/index_range.rs:23:46
     |
23   |         unsafe { assert_unsafe_precondition!(start <= end) };
     |                                              ^^^^^ no rules expected this token in macro call
    ::: library/core/src/intrinsics.rs:2180:1
     |
2180 | macro_rules! assert_unsafe_precondition {
     | --------------------------------------- when calling this macro
     | --------------------------------------- when calling this macro

error: no rules expected the token `self`
     |
     |
296  |             assert_unsafe_precondition!(self.end() <= slice.len());
     |                                         ^^^^ no rules expected this token in macro call
    ::: library/core/src/intrinsics.rs:2180:1
     |
2180 | macro_rules! assert_unsafe_precondition {
     | --------------------------------------- when calling this macro
     | --------------------------------------- when calling this macro

error: no rules expected the token `self`
     |
     |
305  |             assert_unsafe_precondition!(self.end() <= slice.len());
     |                                         ^^^^ no rules expected this token in macro call
    ::: library/core/src/intrinsics.rs:2180:1
     |
2180 | macro_rules! assert_unsafe_precondition {
     | --------------------------------------- when calling this macro

rust-log-analyzer avatar Sep 17 '22 21:09 rust-log-analyzer

@timvermeulen My mistake turned out to be an obvious one: I'd forgotten to inline some methods, and because it's a non-generic type that meant the benches ended up needing to call it instead of inline it, with the obvious terrible consequences.

I now get exactly the same inner loop for that .flat_map(|color| color.rotate_left(8).to_be_bytes()) bench as without this PR:

.LBB0_5:
	mov	edx, dword ptr [rbx + rcx]
	rol	edx, 8
	bswap	edx
	mov	dword ptr [rax + rcx], edx
	add	rcx, 4
	cmp	rdi, rcx
	jne	.LBB0_5

@rustbot ready

scottmcm avatar Sep 18 '22 03:09 scottmcm

@bors try @rust-timer queue

thomcc avatar Sep 18 '22 03:09 thomcc

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

rust-timer avatar Sep 18 '22 03:09 rust-timer

:hourglass: Trying commit c43b960c906694232a2713f5e8e50f4a24ab1367 with merge 415f589153b1ec566f690ea290e2e0721802b842...

bors avatar Sep 18 '22 03:09 bors

:sunny: Try build successful - checks-actions Build commit: 415f589153b1ec566f690ea290e2e0721802b842 (415f589153b1ec566f690ea290e2e0721802b842)

bors avatar Sep 18 '22 04:09 bors

Queued 415f589153b1ec566f690ea290e2e0721802b842 with parent 5253b0a0a1f366fad0ebed57597fcf2703b9e893, future comparison URL.

rust-timer avatar Sep 18 '22 04:09 rust-timer

Finished benchmarking commit (415f589153b1ec566f690ea290e2e0721802b842): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean[^1] range count[^2]
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.6% [0.5%, 2.0%] 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean[^1] range count[^2]
Regressions ❌
(primary)
2.7% [2.7%, 2.7%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.8% [-2.1%, -1.5%] 3
Improvements ✅
(secondary)
-2.4% [-3.1%, -2.0%] 10
All ❌✅ (primary) -0.7% [-2.1%, 2.7%] 4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean[^1] range count[^2]
Regressions ❌
(primary)
2.1% [2.1%, 2.1%] 1
Regressions ❌
(secondary)
3.1% [2.6%, 3.6%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.9% [-3.5%, -1.6%] 14
All ❌✅ (primary) 2.1% [2.1%, 2.1%] 1

[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes

rust-timer avatar Sep 18 '22 06:09 rust-timer

LGTM!

@bors r+

thomcc avatar Sep 18 '22 17:09 thomcc

:pushpin: Commit fb3a7d8dd4c440f93a0045db0510c485e1330eb9 has been approved by thomcc

It is now in the queue for this repository.

bors avatar Sep 18 '22 17:09 bors

:hourglass: Testing commit fb3a7d8dd4c440f93a0045db0510c485e1330eb9 with merge 088afb5277a06b80285efe8811d7219142e50cbb...

bors avatar Sep 18 '22 22:09 bors

:broken_heart: Test failed - checks-actions

bors avatar Sep 18 '22 22:09 bors

The job x86_64-gnu-stable failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
failures:

---- [codegen] src/test/codegen/slice-iter-len-eq-zero.rs stdout ----

error: verification with 'FileCheck' failed
status: exit status: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-iter-len-eq-zero/slice-iter-len-eq-zero.ll" "/checkout/src/test/codegen/slice-iter-len-eq-zero.rs" "--allow-unused-prefixes" "--check-prefixes" "CHECK,NONMSVC"
stdout: none
--- stderr -------------------------------
/checkout/src/test/codegen/slice-iter-len-eq-zero.rs:22:16: error: CHECK-NOT: excluded string found in input
 // CHECK-NOT: icmp
               ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-iter-len-eq-zero/slice-iter-len-eq-zero.ll:18:17: note: found here
 %_18.i.i.i.i = icmp ugt i64 %_3.i, 123


Input file: /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-iter-len-eq-zero/slice-iter-len-eq-zero.ll


-dump-input=help explains the following input dump.
Input was:
<<<<<<
        .
        .
        .
        .
       13: ; Function Attrs: nonlazybind uwtable 
       14: define noundef zeroext i1 @array_into_iter_len_eq_zero(ptr noalias nocapture noundef readonly dereferenceable(392) %y) unnamed_addr #1 personality ptr @rust_eh_personality { 
       15: bb1: 
       16:  %0 = getelementptr inbounds { i64, i64 }, ptr %y, i64 0, i32 1 
       17:  %_3.i = load i64, ptr %0, align 8, !alias.scope !2 
       18:  %_18.i.i.i.i = icmp ugt i64 %_3.i, 123 
not:22                     !~~~                     error: no match expected
       19:  br i1 %_18.i.i.i.i, label %bb1.i.i.i.i, label %"_ZN4core3ptr91drop_in_place$LT$core..array..iter..IntoIter$LT$$u5b$u8$u3b$$u20$3$u5d$$C$123_usize$GT$$GT$17hbdfa8dea7c1f446aE.exit" 
       20:  
       21: bb1.i.i.i.i: ; preds = %bb1 
       22:  tail call void @llvm.trap(), !noalias !5 
       23:  unreachable 
        .
        .
>>>>>>
------------------------------------------

rust-log-analyzer avatar Sep 18 '22 22:09 rust-log-analyzer

@rustbot author

thomcc avatar Sep 19 '22 20:09 thomcc

Oh, it's failing on the UB trap check. I've ignored the codegen test for debug, since it's a -O test.

@bors r=thomcc

scottmcm avatar Sep 20 '22 07:09 scottmcm

:pushpin: Commit 6dbd9a29c21db63e2c72f5e7f4f8b5ba58023875 has been approved by thomcc

It is now in the queue for this repository.

bors avatar Sep 20 '22 07:09 bors

:hourglass: Testing commit 6dbd9a29c21db63e2c72f5e7f4f8b5ba58023875 with merge da70c0543ba1f77142453c5c82e0a1c1e4e327d3...

bors avatar Sep 20 '22 15:09 bors

@bors retry (cycling a higher priority PR)

https://github.com/rust-lang/rust/pull/102028

oli-obk avatar Sep 20 '22 16:09 oli-obk

:hourglass: Testing commit 6dbd9a29c21db63e2c72f5e7f4f8b5ba58023875 with merge ebbb1c6a0f71f7a95483d2fef3bed8c804b21cbe...

bors avatar Sep 20 '22 16:09 bors

:broken_heart: Test failed - checks-actions

bors avatar Sep 20 '22 16:09 bors

Looks like the retry didn't put it back in the queue right? I'll try it again.

@bors retry

scottmcm avatar Sep 20 '22 22:09 scottmcm

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

rust-log-analyzer avatar Sep 21 '22 00:09 rust-log-analyzer

:hourglass: Testing commit 6dbd9a29c21db63e2c72f5e7f4f8b5ba58023875 with merge 4ecfdfac51b159f68fce608792affb34a70e6f73...

bors avatar Sep 21 '22 00:09 bors

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

rust-log-analyzer avatar Sep 21 '22 03:09 rust-log-analyzer

:sunny: Test successful - checks-actions Approved by: thomcc Pushing 4ecfdfac51b159f68fce608792affb34a70e6f73 to master...

bors avatar Sep 21 '22 03:09 bors

Finished benchmarking commit (4ecfdfac51b159f68fce608792affb34a70e6f73): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean[^1] range count[^2]
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-0.4%, -0.4%] 1

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean[^1] range count[^2]
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.7% [0.9%, 2.5%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean[^1] range count[^2]
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.0% [-2.0%, -2.0%] 1
Improvements ✅
(secondary)
-3.6% [-3.6%, -3.6%] 1
All ❌✅ (primary) -2.0% [-2.0%, -2.0%] 1

[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes

rust-timer avatar Sep 21 '22 04:09 rust-timer