polars icon indicating copy to clipboard operation
polars copied to clipboard

Coalesce functionality

Open Houski opened this issue 1 year ago • 3 comments

Problem Description

The ability to select two (or more?) columns horizontally, taking first non-null value.

It appears Pandas has implemented similar functionality but called it "combine_first". https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.combine_first.html

Here is the SQL documentation on the functionality of coalesce from Postgres. https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL https://stackoverflow.com/questions/43177685/how-to-implement-sql-coalesce-in-pandas

Houski avatar Sep 17 '22 16:09 Houski

You can easily write it yourself as an expression:

df.with_column(
    pl.when(pl.col("a").is_null()) 
    .then(pl.col("b"))
    .otherwise(pl.col("a"))
    .alias("combine_first")
) 

ghuls avatar Sep 18 '22 09:09 ghuls

If there are more than two columns, then better use fill_null.

df = pl.DataFrame({"a": [None, None, None], "b": [1, 2, None], "c": [1, None, 3]})
df.with_column((c("a").fill_null(c("b")).fill_null(c("c"))).alias("combine_first"))

leoliu0 avatar Sep 19 '22 10:09 leoliu0

@Houski: FYI, @ritchie46 just added first-class coalesce support in https://github.com/pola-rs/polars/pull/4931. Will be available in the next release.

Example:

df.select( pl.coalesce(["a", "b", "c", 10]) )

alexander-beedie avatar Sep 22 '22 11:09 alexander-beedie