polars
polars copied to clipboard
pl.Expr.cum_count should not count nulls ALSO create pl.cum_count for counting rows
Description
pl.Expr.count was recently changed to not count nulls but pl.Expr.cum_count wasn't also changed. If pl.Expr.cum_count is changed to not count nulls then it would be nice to have a pl.cum_count to count rows without influence of pl.first() having nulls in it.
Reference issue: #5396
I am working on the cum_count function.
Also, cum_count starts at 0 which is definitely wrong.
@stinodego to be honest, if cum_count starts at 0 is wrong, when you get the first element from a list, why use 0 as index, if should be 1. People use list or pandas(cumcount starts at 0) for many years. I think it maybe hard to change the habit.
An index starts at 0. Indexing has nothing to do with cum_count, which is an aggregation function.
cum_count returns an equivalent of the the count aggregation at each point in the column. If we take the count of a column consisting of a single non-null element, the result is 1, not 0.