arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++][Compute] Accept `JoinOptions` in `binary_join` to handle nulls

Open dangotbanned opened this issue 2 weeks ago • 0 comments

Describe the enhancement requested

Short

I'd like to be able to handle nulls in binary_join in the same way as binary_join_element_wise.

Longer

I've recently been trying to use binary_join to get similar behavior to polars.Expr.list.join - which has an ignore_nulls argument:

Show polars

I personally wouldn't want to propagate nulls, but being able to opt-in to ignoring them would be helpful

import polars as pl

data = {
    "a": [
        ["a", "b", "c"],
        [None, None, None],
        [None, None, "1", "2", None, "3", None],
        ["x", "y"],
        ["1", None, "3"],
        [None],
        None,
        [],
        [None, None],
    ]
}
result = pl.DataFrame(data).with_columns(
    propagate_nulls=pl.col("a").list.join("-", ignore_nulls=False),
    ignore_nulls=pl.col("a").list.join("-", ignore_nulls=True),
)
print(result)
shape: (9, 3)
┌──────────────────────┬─────────────────┬──────────────┐
│ a                    ┆ propagate_nulls ┆ ignore_nulls │
│ ---                  ┆ ---             ┆ ---          │
│ list[str]            ┆ str             ┆ str          │
╞══════════════════════╪═════════════════╪══════════════╡
│ ["a", "b", "c"]      ┆ a-b-c           ┆ a-b-c        │
│ [null, null, null]   ┆ null            ┆              │
│ [null, null, … null] ┆ null            ┆ 1-2-3        │
│ ["x", "y"]           ┆ x-y             ┆ x-y          │
│ ["1", null, "3"]     ┆ null            ┆ 1-3          │
│ [null]               ┆ null            ┆              │
│ null                 ┆ null            ┆ null         │
│ []                   ┆                 ┆              │
│ [null, null]         ┆ null            ┆              │
└──────────────────────┴─────────────────┴──────────────┘
Show pyarrow

import pyarrow as pa
import pyarrow.compute as pc

data = {
    "a": [
    ["a", "b", "c"],
    [None, None, None],
    [None, None, "1", "2", None, "3", None],
    ["x", "y"],
    ["1", None, "3"],
    [None],
    None,
    [],
    [None, None],
]}


pc.binary_join(pa.array(data["a"]), "-")
<pyarrow.lib.StringArray object at 0x000001E7AC7F93C0>
[
  "a-b-c",
  null,
  null,
  "x-y",
  null,
  null,
  null,
  "",
  null
]

It is possible to get the same behavior, but I'd much rather be able to write null_handling="skip" if possible 🙏

  • https://github.com/narwhals-dev/narwhals/blob/e68d9ab9b12562848602e7a0d2f7baf80bc0576a/narwhals/_plan/arrow/functions.py#L600-L700

Component(s)

C++, Python

dangotbanned avatar Dec 13 '25 22:12 dangotbanned