arrow
arrow copied to clipboard
[C++][Compute] Accept `JoinOptions` in `binary_join` to handle nulls
Describe the enhancement requested
Short
I'd like to be able to handle nulls in binary_join in the same way as binary_join_element_wise.
Longer
I've recently been trying to use binary_join to get similar behavior to polars.Expr.list.join - which has an ignore_nulls argument:
Show polars
I personally wouldn't want to propagate nulls, but being able to opt-in to ignoring them would be helpful
import polars as pl
data = {
"a": [
["a", "b", "c"],
[None, None, None],
[None, None, "1", "2", None, "3", None],
["x", "y"],
["1", None, "3"],
[None],
None,
[],
[None, None],
]
}
result = pl.DataFrame(data).with_columns(
propagate_nulls=pl.col("a").list.join("-", ignore_nulls=False),
ignore_nulls=pl.col("a").list.join("-", ignore_nulls=True),
)
print(result)
shape: (9, 3)
┌──────────────────────┬─────────────────┬──────────────┐
│ a ┆ propagate_nulls ┆ ignore_nulls │
│ --- ┆ --- ┆ --- │
│ list[str] ┆ str ┆ str │
╞══════════════════════╪═════════════════╪══════════════╡
│ ["a", "b", "c"] ┆ a-b-c ┆ a-b-c │
│ [null, null, null] ┆ null ┆ │
│ [null, null, … null] ┆ null ┆ 1-2-3 │
│ ["x", "y"] ┆ x-y ┆ x-y │
│ ["1", null, "3"] ┆ null ┆ 1-3 │
│ [null] ┆ null ┆ │
│ null ┆ null ┆ null │
│ [] ┆ ┆ │
│ [null, null] ┆ null ┆ │
└──────────────────────┴─────────────────┴──────────────┘
Show pyarrow
import pyarrow as pa
import pyarrow.compute as pc
data = {
"a": [
["a", "b", "c"],
[None, None, None],
[None, None, "1", "2", None, "3", None],
["x", "y"],
["1", None, "3"],
[None],
None,
[],
[None, None],
]}
pc.binary_join(pa.array(data["a"]), "-")
<pyarrow.lib.StringArray object at 0x000001E7AC7F93C0>
[
"a-b-c",
null,
null,
"x-y",
null,
null,
null,
"",
null
]
It is possible to get the same behavior, but I'd much rather be able to write null_handling="skip" if possible 🙏
- https://github.com/narwhals-dev/narwhals/blob/e68d9ab9b12562848602e7a0d2f7baf80bc0576a/narwhals/_plan/arrow/functions.py#L600-L700
Component(s)
C++, Python