dfply
dfply copied to clipboard
Not taking into account None values in group_by
Hi!
I have a dataframe with three variable: id, category and age.
df = pd.DataFrame({'id' : [1,2,3,4], 'category' : ['a', 'a', 'b', None], 'age': [12,54,67,89]})
I am performing a group_by using category variable, which has a None value:
df >> group_by(X.category) >> summarize(total = n(X.id))
The result is the following one:
category | total |
---|---|
a | 2 |
b | 1 |
Shouldn't it be the following result?
category | total |
---|---|
a | 2 |
b | 1 |
None | 1 |
Even though I transform the None to np.nan, I get the same result.
Is there a way to have dropna=True
arg in group_by()
?
(Pandas Ref: https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.1.0.html#allow-na-in-groupby-key)
cc: @kieferk