arkouda
arkouda copied to clipboard
dataframe.GroupBy.count() to align with pandas
Right now dataframe.GroupBy.count() is an alias of dataframe.GroupBy.size(), but should align with the pandas api.
More specifically, it should return a dataframe, where the count of each groupby value is returned separately for each column. The count for each group should exclude NaN values, meaning that each output column could have different values.
The following example demonstrates that the arkouda and pandas behaviors do not currently align.
ak_df = DataFrame({"gb_id":["A","B","A","A","B"], "nums1":[1.0,2.0,float("nan"),float("nan"),float("nan")], "nums2":[3.0,4.0,5.0,float("nan"),float("nan")], } )
display(ak_df)
pd_df = ak_df.to_pandas()
display(pd_df)
pd_df.groupby('gb_id').count()
ak_df.groupby('gb_id').count()
Relevant pandas docs; https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.count.html