dfply Not taking into account None values in group

Not taking into account None values in group_by

Open pachoning opened this issue 5 years ago • 1 comments

Hi!

I have a dataframe with three variable: id, category and age.

df = pd.DataFrame({'id' : [1,2,3,4], 'category' : ['a', 'a', 'b', None], 'age': [12,54,67,89]})

I am performing a group_by using category variable, which has a None value:

df >> group_by(X.category) >> summarize(total = n(X.id))

The result is the following one:

category	total
a	2
b	1

Shouldn't it be the following result?

category	total
a	2
b	1
None	1

Even though I transform the None to np.nan, I get the same result.

Oct 31 '19 12:10 pachoning

Is there a way to have dropna=True arg in group_by() ? (Pandas Ref: https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.1.0.html#allow-na-in-groupby-key) cc: @kieferk

Jan 17 '22 16:01 sundarcf

dfply dfply copied to clipboard

Not taking into account None values in group_by

dfply
dfply copied to clipboard