polars
polars copied to clipboard
Coalesce functionality
Problem Description
The ability to select two (or more?) columns horizontally, taking first non-null value.
It appears Pandas has implemented similar functionality but called it "combine_first". https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.combine_first.html
Here is the SQL documentation on the functionality of coalesce from Postgres. https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL https://stackoverflow.com/questions/43177685/how-to-implement-sql-coalesce-in-pandas
You can easily write it yourself as an expression:
df.with_column(
pl.when(pl.col("a").is_null())
.then(pl.col("b"))
.otherwise(pl.col("a"))
.alias("combine_first")
)
If there are more than two columns, then better use fill_null.
df = pl.DataFrame({"a": [None, None, None], "b": [1, 2, None], "c": [1, None, 3]})
df.with_column((c("a").fill_null(c("b")).fill_null(c("c"))).alias("combine_first"))
@Houski: FYI, @ritchie46 just added first-class coalesce
support in https://github.com/pola-rs/polars/pull/4931.
Will be available in the next release.
Example:
df.select( pl.coalesce(["a", "b", "c", 10]) )