polars
polars copied to clipboard
add `Expr.coalesce`
Description
I am surprised that coalesce
is accessible at top module level, but it missing from the expression class.
I would like to be able to do pl.col('a', 'b').coalesce()
. I think it would be easier for newcomer to discover this function.
I get what you mean but I think this is problematic. Polars needs some clear rules/guidelines for the future on the direction in which expressions operate!
Pandas has the axis
parameter which many polars users dislike which I can understand.
BUT how can polars make this easy for the user to understand.
Problem
pl.col("a", "b").sum()
it NOT the sum of "a"+"b" but sum("a")
and sum("b")
if you do want the "horizontal sum" you need to do this instead: pl.sum_horizontal("a", "b")
So with your coalesce
example pl.col('a', 'b').coalesce()
would desugar to pl.col('a').coalesce(), pl.col('b').coalesce())
which ofc makes no sense.
I think there is a big need for the polars team / community to come up with very strict rules / guidelines how this can be strucutres in the future from an api perspective because polars functionlity will grow bigger and bigger and having a clear idea how to separate horizontal/vertical operations is critical imo!
If we did, the syntax would be col.coalesce(other_cols)
; the original suggestion (as @JulianCologne points out) would actually broadcast the method to each column ;))
I don't see any problem having this at the expression level though.
Doesn't pl.coalesce(column1, columns2, value1, ...)
seem a bit more intuitive here?
@deanm0000 I think there is potential confusion when a function can easily work in both directions, and in those cases we should make sure to make that clear in the API docs, e.g. that pl.sum()
is vertical and if you want horizontal, you need to do pl.sum_horizontal()
. For coalesce
, vertical doesn't make sense, similar to how we don't need to do pl.col('a').eq_horizontal(pl.col('b'))
.
Doesn't
pl.coalesce(column1, columns2, value1, ...)
seem a bit more intuitive here?
it does to me too but if someone else likes pl.col(column1).coalesce(column2, value1, ...)
and we're just talking about adding that syntax rather than taking away pl.coalesce(column1, columns2, value1, ...)
then c'est la vie
we're just talking about adding that syntax rather
Ahh, did not realize pl.coalesce
already exists.
I'm actually going to close this as 'not planned'. We already have pl.coalesce
at the top level. In this topic it's clear that there is some ambiguity as to what Expr.coalesce
should do, especially when operating on multi-column expressions.
Let's just keep one way to do this for now.
A fluent api of coallesce at the expression level seems helpful to me, though I see the points about it being unintuitive and ambiguous when acting on multiple columns.
For posterity, many of my use cases can be implemented by Expr.fill_null() or Expr.replace()