bottleneck icon indicating copy to clipboard operation
bottleneck copied to clipboard

Catastrohpic accuracy loss in large float32 array for nanmean and nanstd

Open prutschman-iv opened this issue 1 year ago • 2 comments

Describe the bug Starting somewhere between 10 million and 50 million elements, the bn.nanmean and bn.nanstd functions appear to experience a catastrophic loss of accuracy with float32 data.

To Reproduce This code creates float32 arrays of increasing size, and compares the results of the np and Bottleneck versions of nanmean and nanstd:

import numpy as np
import bottleneck as bn
print(f'{np.__version__=} {bn.__version__=}')
million = 10**6
for size in (million, 10*million,50*million, 100*million):
    rand_data = np.random.random(size=size).astype(np.float32)
    print(f"{size}")
    print("    mean\t", np.nanmean(rand_data), bn.nanmean(rand_data))
    print("     std\t", np.nanstd(rand_data), bn.nanstd(rand_data))

When I run it, I get:

np.__version__='1.24.0' bn.__version__='1.4.1'
1000000
    mean         0.5003439 0.5003493428230286
     std         0.28887847 0.28882330656051636
10000000
    mean         0.49992886 0.49994951486587524
     std         0.28866056 0.28725674748420715
50000000
    mean         0.5000019 0.33554431796073914
     std         0.28868446 0.30973944067955017
100000000
    mean         0.4999724 0.16777215898036957
     std         0.2886786 0.38657501339912415

Versions:

Package           Version
----------------- --------------------
astropy           6.1.4
astropy-iers-data 0.2024.10.14.0.32.55
Bottleneck        1.4.1
numpy             1.24.0
packaging         24.1
pip               24.0
pyerfa            2.0.1.4
PyYAML            6.0.2
setuptools        69.2.0
wheel             0.43.0

Expected behavior I expected the differences between numpy and Bottleneck to be zero, or at least small relative to the size of the result.

Additional context I encountered this while trying to track down https://github.com/astropy/astropy/issues/17185 . https://github.com/astropy/astropy/issues/11492 may be related, but there the accuracy loss appeared smaller.

prutschman-iv avatar Oct 15 '24 23:10 prutschman-iv

This might be related: https://github.com/pydata/bottleneck/issues/164

rdbisme avatar Oct 18 '24 22:10 rdbisme

Does this solve the problem? https://github.com/pydata/bottleneck/pull/414

rdbisme avatar Oct 18 '24 22:10 rdbisme