pandas2 Aggregation identity on entirely missing data

Aggregation identity on entirely missing data

Open chris-b1 opened this issue 8 years ago • 1 comments

potentially related to #9

numpy ufuncs have an identity, which pandas follows with respect to misssing data.

np.sum([], dtype='float64')
Out[33]: 0.0

np.nansum([np.nan], dtype='float64')
Out[35]: 0.0

pd.Series([np.nan]).sum()
Out[36]: 0.0

I don't feel that strongly one way or the other but there's definitely a case to be made that [36] should be NA. The number of bug reports indicate that at minimum, people get tripped up by this, xref https://github.com/pydata/pandas/issues/9422

So could consider modifying the identity concept for pandas 2.0, since there will be less binding to numpy semantics.

Sep 14 '16 20:09 chris-b1

I'm +1 on [36] being NA. [33] is probably correct. At minimum it would be useful to document this behavior carefully.

Sep 15 '16 04:09 wesm

pandas2 pandas2 copied to clipboard

Aggregation identity on entirely missing data

pandas2
pandas2 copied to clipboard