polars icon indicating copy to clipboard operation
polars copied to clipboard

pl.Expr.cum_count should not count nulls ALSO create pl.cum_count for counting rows

Open deanm0000 opened this issue 1 year ago • 3 comments
trafficstars

Description

pl.Expr.count was recently changed to not count nulls but pl.Expr.cum_count wasn't also changed. If pl.Expr.cum_count is changed to not count nulls then it would be nice to have a pl.cum_count to count rows without influence of pl.first() having nulls in it.

deanm0000 avatar Jan 05 '24 19:01 deanm0000

Reference issue: #5396

mcrumiller avatar Jan 05 '24 19:01 mcrumiller

I am working on the cum_count function.

stinodego avatar Jan 06 '24 05:01 stinodego

Also, cum_count starts at 0 which is definitely wrong.

stinodego avatar Jan 06 '24 08:01 stinodego

@stinodego to be honest, if cum_count starts at 0 is wrong, when you get the first element from a list, why use 0 as index, if should be 1. People use list or pandas(cumcount starts at 0) for many years. I think it maybe hard to change the habit.

sun-rs avatar Jan 31 '24 02:01 sun-rs

An index starts at 0. Indexing has nothing to do with cum_count, which is an aggregation function.

cum_count returns an equivalent of the the count aggregation at each point in the column. If we take the count of a column consisting of a single non-null element, the result is 1, not 0.

stinodego avatar Jan 31 '24 04:01 stinodego