bottleneck
bottleneck copied to clipboard
[BUG] Numerical inaccuracy in summation based routines
Describe the bug
Bottleneck's implementation of algorithms containing a summation yields different results than numpy for floats. This seems to stem from the fact that numpy uses some sort of compensated summation algorithm to increase accuracy, while bottleneck uses a straight sum, e.g.:
bottleneck/src/reduce_template.c
FOR {
const npy_DTYPE0 ai = AI(DTYPE0);
if (!bn_isnan(ai)) {
asum += ai;
}
To Reproduce
import numpy as np
import bottleneck
# adding float32.eps to 2.f gives 2.f so e.g. Kahan-summation is needed to get result != 2.f
arr = np.hstack(([np.float32(2.)], np.repeat(np.finfo(np.float32).eps, 100000).astype(np.float32)))
print('numpy: ', np.nansum(arr))
print('bottleneck: ', bottleneck.nansum(arr))
numpy: 2.011919
bottleneck: 2.0
System: Linux-5.11.11-arch1-1-x86_64-with-glibc2.33 Python 3.9.2 (default, Feb 20 2021, 18:40:11) [GCC 10.2.0] bottleneck 1.3.2
Expected behavior As implementations can be switched due to non-obvious reasons (like a fallback to numpy routines in the case of non-native byteorder), results between bottleneck-routines and numpy should match. If a complete match of results is not attainable, the documentation should state clearly that bottleneck does not always reproduce numpy results.
Additional context https://github.com/astropy/astropy/issues/11492
If I draw up a PR with Kahan summation, does it have a chance of being accepted? Or will bottleneck refuse to take a performance hit for the sake of precision?
@sebasv Sorry for the lack of response here, I've had significantly less bandwidth this year.
I believe it's possible to match numpy's output while also being faster. I have some local commits that are unfinished that accomplish part of this - biggest issue is that I would want to fix this for every function in one release.
Thank you for the quick response! Let me know if I can help in some capacity.
If I draw up a PR with Kahan summation, does it have a chance of being accepted? Or will bottleneck refuse to take a performance hit for the sake of precision?
As you seem to have put a little more work into this than I did, do you happen to know if numpy uses Kahan, pairwise summation or something completely different ? I couldn't really follow the dispatch logic...
I believe now that Numpy uses pairwise summation (see numpy/numpy#3685). Naive summation has a O(n) error, pairwise has an O(log(n)) error and Kahan has an O(1) error. With a large base case, Naive and pairwise have equivalent speed (just minimal recursion overhead). Kahan requires about 4 times the number of additions. So perhaps pairwise is the best fit for Bottleneck?