dfply icon indicating copy to clipboard operation
dfply copied to clipboard

Not taking into account None values in group_by

Open pachoning opened this issue 5 years ago • 1 comments

Hi!

I have a dataframe with three variable: id, category and age.

df = pd.DataFrame({'id' : [1,2,3,4], 'category' : ['a', 'a', 'b', None], 'age': [12,54,67,89]})

I am performing a group_by using category variable, which has a None value:

df >> group_by(X.category) >> summarize(total = n(X.id))

The result is the following one:

category total
a 2
b 1

Shouldn't it be the following result?

category total
a 2
b 1
None 1

Even though I transform the None to np.nan, I get the same result.

pachoning avatar Oct 31 '19 12:10 pachoning

Is there a way to have dropna=True arg in group_by() ? (Pandas Ref: https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.1.0.html#allow-na-in-groupby-key) cc: @kieferk

sundarcf avatar Jan 17 '22 16:01 sundarcf