arkouda icon indicating copy to clipboard operation
arkouda copied to clipboard

dataframe.GroupBy.count() to align with pandas

Open ajpotts opened this issue 1 year ago • 1 comments

Right now dataframe.GroupBy.count() is an alias of dataframe.GroupBy.size(), but should align with the pandas api.

More specifically, it should return a dataframe, where the count of each groupby value is returned separately for each column. The count for each group should exclude NaN values, meaning that each output column could have different values.

The following example demonstrates that the arkouda and pandas behaviors do not currently align.

ak_df = DataFrame({"gb_id":["A","B","A","A","B"], "nums1":[1.0,2.0,float("nan"),float("nan"),float("nan")], "nums2":[3.0,4.0,5.0,float("nan"),float("nan")], } ) display(ak_df)

pd_df = ak_df.to_pandas() display(pd_df)

pd_df.groupby('gb_id').count()

ak_df.groupby('gb_id').count()

ajpotts avatar Jan 05 '24 14:01 ajpotts

Relevant pandas docs; https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.count.html

ajpotts avatar Mar 15 '24 13:03 ajpotts