libs-team
libs-team copied to clipboard
[ACP] Impl `iter::Sum` (and `iter::Product`) for `num::Saturating<u*>`
Proposal
Problem statement
core::iter::Sum
is implemented for types that can be created by summing a sequence of values. It's implemented for various integer types, and also the core::num::Wrapping
wrapper type (which forces wrapped arithmetic). I propose that it should also be implemented for core::num::Saturating
, which would force saturating arithmetic.
Per discussion (below), I'm only proposing to cover unsigned integer types. Summing over signed integers doesn't have an obviously correct behavior, and I don't want to get tangled up in that in my quest for saturating unsigned summation.
Motivating examples or use cases
I sometimes want to sum a set of integers that might cumulatively exceed some bound, without the risk of overflow (wrapping and crashing would both be bad).
For example, when assembling a record with max size isize::MAX
from a set of smaller fields, I want to sum all the usize
field sizes and then if summed_size > (isize::MAX as usize) { return Err(...); }
. On 32-bit targets (e.g. WASM) this summation can easily overflow, so code that does field_sizes().iter().sum()
is riskier than it looks.
Solution sketch
Adjust the integer_sum_product!()
macro to add Saturating<T>
, following the existing example of Wrapping<T>
. This macro implements both iter::Sum
and iter::Product
for the relevant types.
For example, the macro-expanded impl of iter::Sum
ends up looking like this:
impl core::iter::Sum for Saturating<$t> {
fn sum<I: Iterator<Item = $t>>(iter: I) -> Self {
iter.fold(Saturating(0), |a, b| a + b) // a + b == a.saturating_add(b)
}
}
Alternatives
The only alternative is "do nothing" -- both of the types are in the standard library, so users can't implement the glue trait themselves.
Links and related work
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
the macro-expanded impl of
iter::Sum
ends up looking like this
Should it short-circuit?
Should it short-circuit?
Good idea. Given saturation should usually be rare, it may not be worth the extra check. Unless there is no extra check because the saturation is implemented by a branch anyways.
Hmm, for signed it can't short-circuit, right? Dunno if that means it shouldn't for unsigned either.
Hmm, for signed it can't short-circuit
what should u8::MAX + 2 - 1
yield?
Hmm yeah how should signed saturating sum be implemented? Saturating in from an unbounded integer type feels more intuitive to me.
I wouldn't mind implementing it only for unsigned types.
My intuition is that [i8::MAX, 2, -1].iter().sum() == Saturating(i8::MAX - 1)
, but if that isn't universal then I'd rather implement only unsigned saturation.
Promising exact short-circuiting may inhibit optimizations because we have to stop taking items the moment it saturates. Without or with sloppy short-circuiting we can take items in batches.
Regarding short-circuiting, I think having iter.sum()
consume different amounts of items depending on the output type would be surprising to most users. There's no other iter::Sum
implementations that behave like that, right?
The existing short-circuiting impls (Option
and Result
) require the inputs to also be of those types, which raises the idea of having two impls:
-
Iterator::<Item = u32>::sum::<Saturating<u32>>()
will consume the entire iterator, without short-circuiting. -
Iterator::<Item = Saturating<u32>>::sum::<Saturating<u32>>()
will consume up to the entire iterator, returning immediately after reaching saturation ("exact short-circuiting").
That leaves the fastest option on the table.
That leaves the fastest option on the table.
Yep, that's true. An method on Iterator
that might consume more or fewer items depending on internal compiler optimizations has too many sharp edges for me to be comfortable with, given the wide variety of impl Iterator
types.
fn sum_and_next<I: Iterator<Item = u32>>(iter: &mut I) {
let sum: Saturating<u32> = iter.sum();
println!("sum: {sum:?}");
// What does this print? Can the output change depending on compiler
// optimization level? Use of gccrs / rustc_codegen_gcc ?
println!("next: {:?}", iter.next());
}
There's probably room somewhere for a function shaped like (for example) &[u32] -> Saturating<u32>
if people need to sum up some numbers really really fast and don't need to worry about iterator item counting.
It wouldn't be a compiler optimization, it would be in the library. E.g. we could call next_chunk
to get a bunch of items and have that autovectorized.
It wouldn't be a compiler optimization, it would be in the library. E.g. we could call
next_chunk
to get a bunch of items and have that autovectorized.
Sorry, there must be something I'm missing, because the API you describe seems very difficult to use correctly.
Given each of the following arrays:
[1, u32::MAX, 2, 3, 4, ...]
[1, 1, u32::MAX, 2, 3, 4, ...]
[1, 1, 1, u32::MAX, 2, 3, 4, ...]
[1, 1, 1, 1, u32::MAX, 2, 3, 4, ...]
I would expect the number of items consumed to be identical for all of them -- either it shouldn't short circuit and print None
, or it should short-circuit and print Some(2)
.
The idea of an implementation of Sum
that leaves the underlying iterator in an indeterminate state is not appealing.
I would expect the number of items consumed to be identical for all of them
And what is that expectation derived from? If the documentation doesn't say whether it's short-circuiting or not then where in the API contract did we make any promises? The fact that u32
and Option<u32>
process differently already means that behavior is type-dependent and not a fixed property of sum()
.
Which means if we introduce a new impl then you wouldn't know its behavior anyway and we can choose a new one.
because the API you describe seems very difficult to use correctly.
Pass the iterator by value, don't try to do things with the leftover state is quite easy. Most iterator uses are consuming anyway, not by &mut
.
I would expect the number of items consumed to be identical for all of them
And what is that expectation derived from? If the documentation doesn't say whether it's short-circuiting or not then where in the API contract did we make any promises?
I understand that the contract for iter.sum::<Saturating<T>>()
could leave the state of the iterator undefined after a short-circuiting saturating if we choose to write the API docs in that way.
My argument is that we should not do that. I would not propose an ACP that added such a hard-to-use API, nor would I implement it, nor use it if someone else implemented it.
The fact that
u32
andOption<u32>
process differently already means that behavior is type-dependent and not a fixed property ofsum()
. Which means if we introduce a new impl then you wouldn't know its behavior anyway and we can choose a new one.
But both of those behave predictably. Option<u32>
will not continue reading past the first None
it encounters.
Pass the iterator by value, don't try to do things with the leftover state is quite easy. Most iterator uses are consuming anyway, not by
&mut
.
Standard library functions without an unsafe
should not have unspecified behavior under normal operating conditions.
I would not propose an ACP that added such a hard-to-use API
I am confused what's so hard to use about it. You pass in an iterator. You get out a saturated result and it's as fast as a hand-optimized loop over a slice.
Standard library functions without an unsafe should not have unspecified behavior under normal operating conditions.
That already isn't true for fairly central std features like HashMap iteration order or float results. Or the order in which predicates are evaluated for sorting. There are plenty more unspecified implementation details.
I am confused what's so hard to use about it. You pass in an iterator. You get out a saturated result and it's as fast as a hand-optimized loop over a slice.
I would expect auto-vectorization and batching in a function like u32::saturating_sum_slice(xs: &[u32]) -> u32
, since the number of items read from the input isn't observable.
For Iterator::sum()
, having the number of items consumed depend on both the content and (invisible) chunk size means that I have to be extra-careful around using it, which defeats the purpose of having it in the first place (vs copy-pasting a helper function).
Exhaustion behavior already is not universal. Some iterators short-circuit, others don't. So you need to read the documentation for each thing you use. sum
doesn't make a guarantee in either direction and we already established that it's type-dependent.
So you already have to be "careful" if you're trying to look at the exhaustion state. Which ime is not that common. Consuming the iterator is way more common. If you're writing generic code over Sum
that depends on exhaustion behaviror then your code is already wrong and you weren't careful enough.
~Arguably your code would be wrong anyway because we're not guaranteeing short-circuiting for Option
, so relying on that would also be wrong.~
So I don't understand how that adds an "extra" to the careful.
You really just shouldn't be looking at the exhaustion state, even without this ACP.
I'll leave the discussion about whether consuming items from an Iterator
should be predictable, because it doesn't seem like it will be possible for you and me to reach consensus on that topic.
Regarding this:
Arguably your code would be wrong anyway because we're not guaranteeing short-circuiting for
Option
, so relying on that would also be wrong.
Note that sum for Option
does in fact guarantee exact short-circuiting: https://doc.rust-lang.org/std/option/enum.Option.html#method.sum
Takes each element in the
Iterator
: if it is aNone
, no further elements are taken, and theNone
is returned. Should noNone
occur, the sum of all elements is returned.
Ah, that didn't show up on the Sum
trait's doc page.
Iterator::sum
takes self
by value. It shouldn't be possible for the caller to tell if every element was consumed or not without side effects happening from the iteration. If you actually rely on every element getting evaluated, then sum
doesn't seem like the right function and you might be better off with an explicit fold
where you can add explicit side effects to the summation.
Iterator::sum
takesself
by value. It shouldn't be possible for the caller to tell if every element was consumed or not without side effects happening from the iteration. If you actually rely on every element getting evaluated, thensum
doesn't seem like the right function and you might be better off with an explicitfold
where you can add explicit side effects to the summation.
As discussed in this thread, the side effects from iteration are observable.
fn main() {
let xs: &[Option<u32>] = &[Some(1), Some(2), Some(3), None, Some(5), Some(6)];
let mut i = xs.iter();
let s: Option<u32> = (&mut i).map(|x| *x).sum();
let extra: Vec<&Option<u32>> = i.collect();
println!("{extra:?}");
}
The options for behavior are:
- Do not short-circuit -- consume the entire iterator, even if saturation is reached. This enables batching of additions, but might perform more addition operations than necessary.
- Exact short circuiting --
sum()
must stop iterating after saturation is reached. This provides predictable and deterministic behavior, but inhibits batching of additions.- The existing behavior of
sum()
forOption<T>
is exact short-circuiting.
- The existing behavior of
- Ambiguous short circuiting --
sum()
may stop iterating after saturation is reached, but is not required to, and the number of items consumed is unspecified. This would be the most efficient implementation, at the cost of potentially non-deterministic behavior (varying between compilation modes, platforms, etc).
I'm comfortable with either (1) or (2). I would not want (3) to be the default behavior in the standard library's sum()
because it would be extremely surprising for such a basic operation to have unspecified behavior.
fwiw I personally am most comfortable with (1) - I wouldn't expect a change from i64
to Saturating<i64>
to change the behavior of my code, other than that all numeric operations saturate.
My intuition is that
[i8::MAX, 2, -1].iter().sum() == Saturating(i8::MAX - 1)
, but if that isn't universal then I'd rather implement only unsigned saturation.
That would be my intuition as well - might be worth to make a forum post and get some more input on how others would intuitively think about this?
(1) and (3) are the options that enable vectorization in general. (2) would only allow it in special cases where an implementation is certain that no side-effects exist and the iterator could be rewound.
(3) would result in the highest performance.
For option (1) an implementation could at least try to exhaust the iterator more cheaply by calling advance_by(usize::MAX)
(repeatedly if necessary) when it has been saturated. Though it'll require some balancing to make that branch not too expensive in the case where it doesn't saturate. And of course it would only help for adapters where advance_by
actually is faster.
Actually, I think saturation short circuiting is less surprising than Option short circuiting, because the former only affects the internal state of the iterator after the sum, which shouldn't be relied upon anyways, while the latter has a reasonable non-short circuiting implementation that would change the direct output. That implementation being to treat None
values as the identity element (0 for sum, 1 for product). Either implementation however is trivial to implement manually, either precede the sum call with map_while(id)
for short circuiting, or filter_map(id)
for non short circuiting.