pandas2
pandas2 copied to clipboard
Aggregation identity on entirely missing data
potentially related to #9
numpy ufuncs have an identity, which pandas
follows with respect to misssing data.
np.sum([], dtype='float64')
Out[33]: 0.0
np.nansum([np.nan], dtype='float64')
Out[35]: 0.0
pd.Series([np.nan]).sum()
Out[36]: 0.0
I don't feel that strongly one way or the other but there's definitely a case to be made that [36]
should be NA
. The number of bug reports indicate that at minimum, people get tripped up by this, xref https://github.com/pydata/pandas/issues/9422
So could consider modifying the identity concept for pandas 2.0, since there will be less binding to numpy semantics.
I'm +1 on [36] being NA. [33] is probably correct. At minimum it would be useful to document this behavior carefully.