rust `slice::Iter` + `max` on array: Regression since Rust 1.65 (LLVM 15)

Code

I tried this code:

pub fn fna(x: [u8; 1024]) -> u8 {
    x.into_iter().max().unwrap()
}

pub fn fnb(x: [u8; 1024]) -> u8 {
    *x.iter().max().unwrap()
}

Godbolt

I expected to see this happen:

Both functions should optimize away the unwrap
(Bonus) Both functions should inline the respective iterator functions and auto-vectorize

Instead, this happened:

fna has no unwrap in the assembly output, but fnb has.
(Bonus) Both inline their respective iterators (according to the MIR output on the playground). fna auto-vectorizes, but fnb doesn't.

fnb generates a byte-wise max implementation. Interestingly, it keeps some information about the array length around, influencing the codegen. A byte-wise max implementation for [u8; 1024] will need to do 1023 comparisons (since the first value of the array will be the initial max result value), and 1023 is divisible by 3, so the compiler emits a loop that emits a loop that runs 341 times, doing 3 max computations each. If I change the length of the array to 1025, it will do 4 max computations each.

Even though the generated code relies on the number of array entries, it still doesn't elide the Option::unwrap check for the max result. In Rust 1.45-1.64 fnb also generates the 3-byte-at-a-time loop, but does remove the Option::unwrap.

Version it worked on

It most recently worked on: Rust 1.64. (For context: Rust 1.65 added LLVM 15.)

It (initially?) started working in 1.45 (that version added LLVM 10).

Version with regression

Any version since Rust 1.65.

Since I thought this behavior was unexpected, I polled it on mastodon before looking further into it. ATM about half of the respondents thought just like me that both should be optimized (at least the unwrap part):

Aug 25 '24 17:08 seritools

fnb is a by-reference iterator. If you use .copied().max() it vectorizes. See #106539

Aug 25 '24 17:08 the8472

@the8472 ah, that explains the vectorization difference (bonus). just to clarify - that one is not the regression, just another thing I've noticed ^^

Aug 25 '24 17:08 seritools

The slice by-reference iterator fully unrolls and optimizes out the unwrap for len <= 10: https://rust.godbolt.org/z/1cT4z91eP (at length 11 LLVM still unrolls but can't figure out the unwrap)

That's not a power of 2 so I feel like the distinction between 10 and 11 is a magic number in some LLVM optimization.

Aug 25 '24 19:08 saethlin

It looks like the 10 -> 11 difference is in InstCombine: https://rust.godbolt.org/z/hKof6449W Looks like we're getting some kind of special treatment until the loop exceeds 10 iterations.

Aug 25 '24 19:08 saethlin

@saethlin Yeah, the problem there is that we need to look through a very deep sequence of selects to determine that the final pointer is actually guaranteed to be non-null. This is going to hit a cutoff sooner or later.

I think we could support this without cutoffs inside SCCP, by starting from a NotConstant(Null) value lattice root for the nonnull argument and then propagating that across GEP inbounds. I'll try that when I have a chance.

Aug 25 '24 20:08 nikic

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-low

Aug 26 '24 14:08 apiraino

https://github.com/llvm/llvm-project/pull/106090 should fix both the unrolled and not unrolled cases.

Aug 26 '24 15:08 nikic

Looks like with the LLVM bump this is fixed in nightly 🎉

Seems like a plausible thing to have a test for, since we should be able to write a fairly focused one:

Confirm that doing this on &[u8] does have the call{{.+}}unwrap_failed.
Confirm that doing this on &[u8; 1024] does not have it.

(Notably: don't check for "no calls" or similar. There are loads of calls in the LLVM IR, like unsurprisingly to llvm.umax.)

Mar 28 '25 18:03 scottmcm