awkward icon indicating copy to clipboard operation
awkward copied to clipboard

Difference between `ak.std()` and `np.std()`

Open taehyounpark opened this issue 4 months ago • 5 comments

Version of Awkward Array

2.8.1

Description and code to reproduce

I'm seeing a NaN vs. finite-valued std() difference between NumPy and Awkward on the exact same array.

NumPy call: np.std(ak.to_numpy(ak.flatten(X))) Awkward call: ak.std(ak.flatten(X))

Below are some example outputs:

  np.mean() = 1.0000672340393066
  ak.mean() = 1.0000670059867527
  np.std() = 2.9891753001720645e-05
  ak.std() = 0.0005289947633893411
  np.mean() = 0.9998437166213989
  ak.mean() = 0.9998438106560559
  np.std() = 0.0003517308796290308
  ak.std() = nan

The arrays contain float32's in both the numpy and awkward implementation. I'm surprised to see the large difference in std(), especially a case where nan is reported and not. There are no nan's in any of the arrays and I would expect a finite standard deviation value.

taehyounpark avatar Aug 21 '25 12:08 taehyounpark

Seems to me like the same problem as https://github.com/scikit-hep/awkward/issues/3525. Numpy will convert the input to float64 in such functions before summing the array. Can you let me know what happens if you cast the array to np.float64 before applying mean and std in awkward? You can use ak.values_astype for that.

ikrommyd avatar Aug 21 '25 12:08 ikrommyd

Yes, this fixes the issue, sorry for the duplicate! So the (more) correct way would be to indeed use float64 for calculating variances, then?

taehyounpark avatar Aug 21 '25 13:08 taehyounpark

The more correct way in my opinion is if awkward did that for you internally. The problem at the moment is that numpy does that casting implicitly inside the "loop over the array elements". In awkward array, we'd currently have to cast to float64 before passing it to the summation function. That would increase your memory because you'd create a copy of the array first and them sum it. I'm doing this in https://github.com/scikit-hep/awkward/pull/3527 but the memory spike is a problem. I think the more correct approach is if awkward had a summation kernel that does implicit casting internally to avoid the memory spike.

ikrommyd avatar Aug 21 '25 13:08 ikrommyd

I'd say it's not a bug, but a feature request.

ianna avatar Sep 19 '25 16:09 ianna

https://github.com/scikit-hep/awkward/pull/3653 may have improved this for float32s. @taehyounpark could you run your example with awkward main branch and post the numpy/awkward comparison?

ikrommyd avatar Sep 19 '25 16:09 ikrommyd