libs-team icon indicating copy to clipboard operation
libs-team copied to clipboard

ACP: Avoid the common mistake of `for x in 0u8..`

Open crlf0710 opened this issue 7 months ago • 5 comments

Proposal

Problem statement

When a type has finite possible values and total ordering, it is tempting to write this code: for x in SOME_VAL.. {} However this is plain incorrect... It panics with overflow in debug mode and loops infinitely in release mode, for integers.

Linting against this pattern is helpful, but it's more meaningful to have a correct fix for the expected behavior.

Motivating examples or use cases

Integers and char types(see rust-lang #111514) and many other types might benefit from this.

Solution sketch

Maybe add a to_inclusive() api to open-ended ranges types, such that: (0u8..).to_inclusive() => (0u8..255u8)

Alternatives

TO BE ADDED.

Links and related work

  • https://github.com/rust-lang/rust/issues/111514

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

crlf0710 avatar Nov 26 '23 03:11 crlf0710

As I understand it, the behavior where RangeFrom<N> overflows at the extreme is a consequence of the range types implementing Iterator directly.

In order to prevent the overflow and stop at the max value, we would need to implement a flag similar to RangeInclusive.exhausted. But doing that would increase the size of of RangeFrom by a factor of 2 and would be a breaking change.

So the real solution to this is the same solution to the impl Copy for Range* issue: figure out how to transition the range types from directly implementing Iterator to implementing IntoIterator instead. Then the flag can just live on the iterator type instead of the Range type.

pitaj avatar Nov 26 '23 04:11 pitaj

All that said, it would be good to have a lint for this. Please open a clippy issue for that.

pitaj avatar Nov 26 '23 04:11 pitaj

There's nothing particularly wrong with them if you expect to find a value in that range and then break or return. Using RangeInclusive would be worse for performance.

the8472 avatar Nov 26 '23 12:11 the8472

Well, one problem is that the iterator can only ever yield 0..=MAX-1 because to yield the maximum it needs to increment the iterator state past it.

So the lint can actually just recommend 0..MAX, no need for inclusive.

pitaj avatar Nov 26 '23 20:11 pitaj

As another potential motivation, a bug was introduced to the image-gif crate because of an open-ended range 2 years ago. It's very easy to hit this with u8. https://github.com/image-rs/image-gif/commit/0215cf2210efb5f5d83cdfaf2ff411b693e031ae

The behavior is confusing to explain. If someone writes the non-panicking for loop, they would expect the collect code to work too but it panics in debug. It's easy to miss code like this because you might only run it in --release mode unless you have tests that cover it.

fn panic() {
    let x = (0..).zip([0; 256].iter()).collect::<Vec<(u8, &u8)>>();
    println!("{}", x.len());
}

fn no_panic() {
    let mut sum = 0u32;
    for (a, _b) in (0..).zip([0; 256].iter()) {
        sum += a;
    }
    println!("{sum}");
}

(playground link)

Personally, instead of an open-ended range iterator I almost always want (0..=MAX), so I write that to avoid possible panics. If I do use an open-ended range, I make sure to immediately follow with a take or zip with something that I know is shorter.

edit: Sorry if this is off-topic. I realize my post is more about the hazards of using open-ended ranges than the specific for x in 0u8.. pattern.

okaneco avatar Dec 02 '23 22:12 okaneco

playground link

That's very surprising, to say the least. Thank you for reporting it!

Sorry if this is off-topic. I realize my post is more about the hazards of using open-ended ranges than the specific for x in 0u8.. pattern.

Don't worry. I would dare to say for x in u16.. or even u32 should be avoided too. Long-running "real-time" systems must not panic for something like this

Rudxain avatar May 21 '24 03:05 Rudxain

Long-running "real-time" systems must not panic for something like this

It won't "panic" if you turn off -C overflow-checks. If the "real-time" system accepts a + b potentially panicking then they should have no problem with u8..'s panic. And also vice-versa.

kennytm avatar May 21 '24 05:05 kennytm

If the "real-time" system accepts a + b potentially panicking then they should have no problem with u8..'s panic

Correct! The example I gave was bad. But my intention/opinion is the same: the code is a potential footgun, no matter the int size (although u128 is "extreme enough" to not be concerning, in practice)

Rudxain avatar May 21 '24 09:05 Rudxain