ndarray
ndarray copied to clipboard
.mean() on ArrayView1<u64>: panicked at 'attempt to add with overflow'
I wanted to calculate a mean over an ArrayView1<u64>
with 1000 values and encounter a panic:
thread 'main' panicked at 'attempt to add with overflow', /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ops/arith.rs:107:1
Is this intended? Or should the u64
values internally be casted to u128
when calculating the mean?
The following code reproduces the error:
use ndarray::*;
fn main() {
let mut data_vec: Vec<u64> = Vec::new();
for _ in 1..1000 {
data_vec.push(132542363533234561)
}
let data = data_vec.as_slice();
let view: ArrayView1<u64> = ArrayView1::from_shape(data.len(), data).unwrap();
println!("{:?}", view.mean());
}
I think the reason for this error is that mean() uses the public sum() function, which panics if the result does not fit into the element type. However, the mean result should always fit into the element type, so mean() maybe shouldn't use the public sum() function.
u64 is an integer type, so it doesn't really have the fidelity for a mean. What kind of result is expected? mean of [1, 3, 3, 8] would be 3 in u64
. Is this acceptable? (In the reals the answer is 3.75).
@bluss yes, it should be just the float without the fractional part, just the behavior the mean function has when calculating the mean of few u64's (not 1000 like in the example)
We could potentially move to an iterative calculation of mean a la http://www.heikohoffmann.de/htmlthesis/node134.html I haven't worked through how this will impact accuracy for integral means.
@xd009642 would casting of the sum results if they exceed u64 be an option?
My gut feeling that the added complexity will impact performance and having the user manually choose to cast and average would be better - opt into the slower implementation. But with performance stuff it's always best to measure instead of guess
@xd009642 I'm using mean_axis at the moment to calculate the means of groups of u64
values along an axis. Can you imagine any workaround at the moment to not encounter the overflow? (Without copying all values)
fold
or map_axis
, I think, should do It (Edited)
@bluss thanks! So maybe the first step in this issue should be adding the mentioned panic case to the functions documentation: https://github.com/rust-ndarray/ndarray/blob/4d9641db085e64729d52050ea0c24e4a541629a7/src/numeric/impl_numeric.rs#L70 What do you think?
We need to do a more general take on this but it's a good idea to have a note on .sum() and .mean() about this.
- It's a debug assertion - only fires when debug assertions are enabled, not in release mode. So it needs to be documented that way, it's not like the "mandatory" panics.
- It's a general problem for all our arithmetic operations on
u64
arrays and similar types - we have debug assertions if addition would wrap et.c.
Personally, I think the best option would be to allow the user to specify the type of the accumulator(s) used by sum
/mean
/sum_axis
/mean_axis
, with an API like this:
pub fn sum_axis<B>(&self, axis: Axis) -> Array<B, D::Smaller>
where
A: Into<B>,
B: Clone + Zero + Add<Output = B>,
D: RemoveAxis,
This would be useful for f32
(to use a f64
accumulator) in addition to integers, shouldn't affect the existing performance, and should provide good performance even when the accumulator type is different from the element type. It's unfortunate that we can't specify a default for the B
type; in many cases, the user would now have to manually specify the accumulator type. That said, the .sum()
method on Iterator
is the same way.
Instead of allowing the user to use a larger accumulator (which could still overflow), it is possible to implement an integer mean without overflow, but it would definitely be slower than the existing implementation and would be specific to integer means. If we add something like this, I think it should be a separate method. Here's my initial idea for an integer-mean-without-overflow:
- Let
A
be the element type, and letn: isize
be the number of elements. - Initialize two accumulators:
-
let mut whole: A = A::zero();
-
let mut numerator: isize = 0;
-
- For each element, let's say that the element divided by
n
isa
and the remainder isb
.- Add
a
to thewhole
accumulator. - Add
numerator + b
divided byn
to thewhole
accumulator, and assign the remainder to thenumerator
accumulator. (numerator + b
can overflow, so a more careful series of computations is necessary to perform these operations without overflow. I haven't thought through the details, but I'm confident it's possible with a temporary variable or two.)
- Add
- Return
whole
. (Optionally, you could usenumerator
to round the result instead of discarding the fractional portion.)
Basically, the idea is to use a mixed fraction as the accumulator (storing the whole part and the numerator), and add each element divided by n
to the accumulator to compute the mean.
I personally would volunteer to implement this, but I never contributed here so I might need some Information where to start when making such a change. @jturner314
I got exactly the same panic just now. I am interested in the solution here. If you want @yannickfunk we could try and iterate over the implementation.
If you are interested in how to make a start (I never contributed to this crate, but did to others). Clone the repo, find the location of the current mean function and start writing an implementation below. Writing a heap of tests, that try to find the hard paths through the code will make you (and the rest of the world) trust the implementation. Once you are sufficiently done open a PR and request a code review (feel free to add me in if you want).