rust
                                
                                
                                
                                    rust copied to clipboard
                            
                            
                            
                        Optimize `array::IntoIter`
.into_iter() on arrays was slower than it needed to be (especially compared to slice iterator) since it uses Range<usize>, which needs to handle degenerate ranges like 10..4.
This PR adds an internal IndexRange type that's like Range<usize> but with a safety invariant that means it doesn't need to worry about those cases -- it only handles start <= end -- and thus can give LLVM more information to optimize better.
I added one simple demonstration of the improvement as a codegen test.
(vec::IntoIter uses pointers instead of indexes, so doesn't have this problem, but that only works because its elements are boxed.  array::IntoIter can't use pointers because that would keep it from being movable.)
Hey! It looks like you've submitted a new PR for the library teams!
If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.
Examples of T-libs-api changes:
- Stabilizing library features
 - Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
 - Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
 - Changing public documentation in ways that create new stability guarantees
 - Changing observable runtime behavior of library APIs
 
r? @thomcc
(rust-highfive has picked a reviewer for you, use r? to override)
This regresses the vec::bench_flat_map_collect benchmark by a lot for me, over 10x.
When looking into this I noticed that array::IntoIter::fold calls fold on iter::ByRefSized(&mut self.alive), which as far as I can see doesn't call the actual fold implementation of IndexRange because ByRefSized would have to pass the iterator by value, which it can't. So using ByRefSized seems to be pointless here. However, it doesn't look like this is what's causing the slowdown.
Thanks, @timvermeulen, I'll look at that one.
I've sent PR #100220 to fix ByRefSized::fold, and will consider this PR blocked until that one goes in and I can confirm it solves the problem you raised.
@rustbot author
Update 2022-08-24: I've rebased atop the other PR, but still not investigated that perf test, so it's not really review-ready yet.
The job mingw-check failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
   Compiling memchr v2.5.0
   Compiling std v0.0.0 (/checkout/library/std)
   Compiling compiler_builtins v0.1.79
   Compiling unwind v0.0.0 (/checkout/library/unwind)
error: no rules expected the token `start`
    --> library/core/src/ops/index_range.rs:23:46
     |
23   |         unsafe { assert_unsafe_precondition!(start <= end) };
     |                                              ^^^^^ no rules expected this token in macro call
    ::: library/core/src/intrinsics.rs:2180:1
     |
2180 | macro_rules! assert_unsafe_precondition {
     | --------------------------------------- when calling this macro
     | --------------------------------------- when calling this macro
error: no rules expected the token `self`
     |
     |
296  |             assert_unsafe_precondition!(self.end() <= slice.len());
     |                                         ^^^^ no rules expected this token in macro call
    ::: library/core/src/intrinsics.rs:2180:1
     |
2180 | macro_rules! assert_unsafe_precondition {
     | --------------------------------------- when calling this macro
     | --------------------------------------- when calling this macro
error: no rules expected the token `self`
     |
     |
305  |             assert_unsafe_precondition!(self.end() <= slice.len());
     |                                         ^^^^ no rules expected this token in macro call
    ::: library/core/src/intrinsics.rs:2180:1
     |
2180 | macro_rules! assert_unsafe_precondition {
     | --------------------------------------- when calling this macro
@timvermeulen  My mistake turned out to be an obvious one: I'd forgotten to inline some methods, and because it's a non-generic type that meant the benches ended up needing to call it instead of inline it, with the obvious terrible consequences.
I now get exactly the same inner loop for that .flat_map(|color| color.rotate_left(8).to_be_bytes()) bench as without this PR:
.LBB0_5:
	mov	edx, dword ptr [rbx + rcx]
	rol	edx, 8
	bswap	edx
	mov	dword ptr [rax + rcx], edx
	add	rcx, 4
	cmp	rdi, rcx
	jne	.LBB0_5
@rustbot ready
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit c43b960c906694232a2713f5e8e50f4a24ab1367 with merge 415f589153b1ec566f690ea290e2e0721802b842...
:sunny: Try build successful - checks-actions
Build commit: 415f589153b1ec566f690ea290e2e0721802b842 (415f589153b1ec566f690ea290e2e0721802b842)
Queued 415f589153b1ec566f690ea290e2e0721802b842 with parent 5253b0a0a1f366fad0ebed57597fcf2703b9e893, future comparison URL.
Finished benchmarking commit (415f589153b1ec566f690ea290e2e0721802b842): comparison URL.
Overall result: ❌ regressions - no action needed
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
@bors rollup=never @rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
| mean[^1] | range | count[^2] | |
|---|---|---|---|
| Regressions ❌  (primary)  | 
- | - | 0 | 
| Regressions ❌  (secondary)  | 
1.6% | [0.5%, 2.0%] | 8 | 
| Improvements ✅  (primary)  | 
- | - | 0 | 
| Improvements ✅  (secondary)  | 
- | - | 0 | 
| All ❌✅ (primary) | - | - | 0 | 
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean[^1] | range | count[^2] | |
|---|---|---|---|
| Regressions ❌  (primary)  | 
2.7% | [2.7%, 2.7%] | 1 | 
| Regressions ❌  (secondary)  | 
- | - | 0 | 
| Improvements ✅  (primary)  | 
-1.8% | [-2.1%, -1.5%] | 3 | 
| Improvements ✅  (secondary)  | 
-2.4% | [-3.1%, -2.0%] | 10 | 
| All ❌✅ (primary) | -0.7% | [-2.1%, 2.7%] | 4 | 
Cycles
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean[^1] | range | count[^2] | |
|---|---|---|---|
| Regressions ❌  (primary)  | 
2.1% | [2.1%, 2.1%] | 1 | 
| Regressions ❌  (secondary)  | 
3.1% | [2.6%, 3.6%] | 2 | 
| Improvements ✅  (primary)  | 
- | - | 0 | 
| Improvements ✅  (secondary)  | 
-2.9% | [-3.5%, -1.6%] | 14 | 
| All ❌✅ (primary) | 2.1% | [2.1%, 2.1%] | 1 | 
[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes
LGTM!
@bors r+
:pushpin: Commit fb3a7d8dd4c440f93a0045db0510c485e1330eb9 has been approved by thomcc
It is now in the queue for this repository.
:hourglass: Testing commit fb3a7d8dd4c440f93a0045db0510c485e1330eb9 with merge 088afb5277a06b80285efe8811d7219142e50cbb...
:broken_heart: Test failed - checks-actions
The job x86_64-gnu-stable failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
failures:
---- [codegen] src/test/codegen/slice-iter-len-eq-zero.rs stdout ----
error: verification with 'FileCheck' failed
status: exit status: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-iter-len-eq-zero/slice-iter-len-eq-zero.ll" "/checkout/src/test/codegen/slice-iter-len-eq-zero.rs" "--allow-unused-prefixes" "--check-prefixes" "CHECK,NONMSVC"
stdout: none
--- stderr -------------------------------
/checkout/src/test/codegen/slice-iter-len-eq-zero.rs:22:16: error: CHECK-NOT: excluded string found in input
 // CHECK-NOT: icmp
               ^
/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-iter-len-eq-zero/slice-iter-len-eq-zero.ll:18:17: note: found here
 %_18.i.i.i.i = icmp ugt i64 %_3.i, 123
Input file: /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-iter-len-eq-zero/slice-iter-len-eq-zero.ll
-dump-input=help explains the following input dump.
Input was:
<<<<<<
        .
        .
        .
        .
       13: ; Function Attrs: nonlazybind uwtable 
       14: define noundef zeroext i1 @array_into_iter_len_eq_zero(ptr noalias nocapture noundef readonly dereferenceable(392) %y) unnamed_addr #1 personality ptr @rust_eh_personality { 
       15: bb1: 
       16:  %0 = getelementptr inbounds { i64, i64 }, ptr %y, i64 0, i32 1 
       17:  %_3.i = load i64, ptr %0, align 8, !alias.scope !2 
       18:  %_18.i.i.i.i = icmp ugt i64 %_3.i, 123 
not:22                     !~~~                     error: no match expected
       19:  br i1 %_18.i.i.i.i, label %bb1.i.i.i.i, label %"_ZN4core3ptr91drop_in_place$LT$core..array..iter..IntoIter$LT$$u5b$u8$u3b$$u20$3$u5d$$C$123_usize$GT$$GT$17hbdfa8dea7c1f446aE.exit" 
       20:  
       21: bb1.i.i.i.i: ; preds = %bb1 
       22:  tail call void @llvm.trap(), !noalias !5 
       23:  unreachable 
        .
        .
>>>>>>
------------------------------------------
@rustbot author
Oh, it's failing on the UB trap check.  I've ignored the codegen test for debug, since it's a -O test.
@bors r=thomcc
:pushpin: Commit 6dbd9a29c21db63e2c72f5e7f4f8b5ba58023875 has been approved by thomcc
It is now in the queue for this repository.
:hourglass: Testing commit 6dbd9a29c21db63e2c72f5e7f4f8b5ba58023875 with merge da70c0543ba1f77142453c5c82e0a1c1e4e327d3...
@bors retry (cycling a higher priority PR)
https://github.com/rust-lang/rust/pull/102028
:hourglass: Testing commit 6dbd9a29c21db63e2c72f5e7f4f8b5ba58023875 with merge ebbb1c6a0f71f7a95483d2fef3bed8c804b21cbe...
:broken_heart: Test failed - checks-actions
Looks like the retry didn't put it back in the queue right? I'll try it again.
@bors retry
A job failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
:hourglass: Testing commit 6dbd9a29c21db63e2c72f5e7f4f8b5ba58023875 with merge 4ecfdfac51b159f68fce608792affb34a70e6f73...
A job failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
:sunny: Test successful - checks-actions Approved by: thomcc Pushing 4ecfdfac51b159f68fce608792affb34a70e6f73 to master...
Finished benchmarking commit (4ecfdfac51b159f68fce608792affb34a70e6f73): comparison URL.
Overall result: ✅ improvements - no action needed
@rustbot label: -perf-regression
Instruction count
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
| mean[^1] | range | count[^2] | |
|---|---|---|---|
| Regressions ❌  (primary)  | 
- | - | 0 | 
| Regressions ❌  (secondary)  | 
- | - | 0 | 
| Improvements ✅  (primary)  | 
-0.4% | [-0.4%, -0.4%] | 1 | 
| Improvements ✅  (secondary)  | 
- | - | 0 | 
| All ❌✅ (primary) | -0.4% | [-0.4%, -0.4%] | 1 | 
Max RSS (memory usage)
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean[^1] | range | count[^2] | |
|---|---|---|---|
| Regressions ❌  (primary)  | 
- | - | 0 | 
| Regressions ❌  (secondary)  | 
1.7% | [0.9%, 2.5%] | 2 | 
| Improvements ✅  (primary)  | 
- | - | 0 | 
| Improvements ✅  (secondary)  | 
- | - | 0 | 
| All ❌✅ (primary) | - | - | 0 | 
Cycles
Results
This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
| mean[^1] | range | count[^2] | |
|---|---|---|---|
| Regressions ❌  (primary)  | 
- | - | 0 | 
| Regressions ❌  (secondary)  | 
- | - | 0 | 
| Improvements ✅  (primary)  | 
-2.0% | [-2.0%, -2.0%] | 1 | 
| Improvements ✅  (secondary)  | 
-3.6% | [-3.6%, -3.6%] | 1 | 
| All ❌✅ (primary) | -2.0% | [-2.0%, -2.0%] | 1 | 
[^1]: the arithmetic mean of the percent change [^2]: number of relevant changes