polars icon indicating copy to clipboard operation
polars copied to clipboard

add `Expr.coalesce`

Open gab23r opened this issue 1 year ago • 6 comments

Description

I am surprised that coalesce is accessible at top module level, but it missing from the expression class. I would like to be able to do pl.col('a', 'b').coalesce(). I think it would be easier for newcomer to discover this function.

gab23r avatar Jan 19 '24 09:01 gab23r

I get what you mean but I think this is problematic. Polars needs some clear rules/guidelines for the future on the direction in which expressions operate!

Pandas has the axis parameter which many polars users dislike which I can understand. BUT how can polars make this easy for the user to understand.

Problem

pl.col("a", "b").sum() it NOT the sum of "a"+"b" but sum("a") and sum("b") if you do want the "horizontal sum" you need to do this instead: pl.sum_horizontal("a", "b")

So with your coalesce example pl.col('a', 'b').coalesce() would desugar to pl.col('a').coalesce(), pl.col('b').coalesce()) which ofc makes no sense.

I think there is a big need for the polars team / community to come up with very strict rules / guidelines how this can be strucutres in the future from an api perspective because polars functionlity will grow bigger and bigger and having a clear idea how to separate horizontal/vertical operations is critical imo!

Julian-J-S avatar Jan 19 '24 10:01 Julian-J-S

If we did, the syntax would be col.coalesce(other_cols); the original suggestion (as @JulianCologne points out) would actually broadcast the method to each column ;))

I don't see any problem having this at the expression level though.

alexander-beedie avatar Jan 19 '24 12:01 alexander-beedie

Doesn't pl.coalesce(column1, columns2, value1, ...) seem a bit more intuitive here?

mcrumiller avatar Jan 19 '24 13:01 mcrumiller

@deanm0000 I think there is potential confusion when a function can easily work in both directions, and in those cases we should make sure to make that clear in the API docs, e.g. that pl.sum() is vertical and if you want horizontal, you need to do pl.sum_horizontal(). For coalesce, vertical doesn't make sense, similar to how we don't need to do pl.col('a').eq_horizontal(pl.col('b')).

mcrumiller avatar Jan 19 '24 13:01 mcrumiller

Doesn't pl.coalesce(column1, columns2, value1, ...) seem a bit more intuitive here?

it does to me too but if someone else likes pl.col(column1).coalesce(column2, value1, ...) and we're just talking about adding that syntax rather than taking away pl.coalesce(column1, columns2, value1, ...) then c'est la vie

deanm0000 avatar Jan 19 '24 17:01 deanm0000

we're just talking about adding that syntax rather

Ahh, did not realize pl.coalesce already exists.

mcrumiller avatar Jan 19 '24 17:01 mcrumiller

I'm actually going to close this as 'not planned'. We already have pl.coalesce at the top level. In this topic it's clear that there is some ambiguity as to what Expr.coalesce should do, especially when operating on multi-column expressions.

Let's just keep one way to do this for now.

stinodego avatar Mar 21 '24 08:03 stinodego

A fluent api of coallesce at the expression level seems helpful to me, though I see the points about it being unintuitive and ambiguous when acting on multiple columns.

For posterity, many of my use cases can be implemented by Expr.fill_null() or Expr.replace()

mesner avatar Jun 24 '24 15:06 mesner