cudf [BUG]: Incorrect result in non-coalesce join with experimental streaming executor and multiple partitions

Describe the bug

The test python/cudf_polars/tests/test_join.py::test_non_coalesce_join[left-nulls_not_equal-join_expr0] fails when using a small blocksize / multiple partitions.

Steps/Code to reproduce bug

Here's a simplified example

import polars as pl

from cudf_polars.testing.asserts import assert_gpu_result_equal

left = pl.LazyFrame(
    {
        "a": [1, 2, 3, 1, None],
        "b": [1, 2, 3, 4, 5],
        "c": [2, 3, 4, 5, 6],
    }
)

right = pl.LazyFrame(
    {
        "a": [1, 4, 3, 7, None, None, 1],
        "c": [2, 3, 4, 5, 6, 7, 8],
        "d": [6, None, 7, 8, -1, 2, 4],
    }
)

q = left.join(right, on=pl.col("a"), how="inner", nulls_equal=False, coalesce=False)
assert_gpu_result_equal(q, engine=pl.GPUEngine(executor="streaming", executor_options={"max_rows_per_partition": 3}))

which fails with

AssertionError: DataFrames are different (value mismatch for column 'a')
[left]:  [1, 1, 3, 1, 1]
[right]: [1, 3, 1, 1, 1]

Expected behavior

Match polars / no error.

Jun 12 '25 21:06 TomAugspurger

Oh, this is maybe just a row order issue. It passes with check_row_order=False. We need to confirm whether that's expected (and if so document that behavior).

Jun 12 '25 21:06 TomAugspurger

Newer polars versions don't provide any ordering guarantees on left joins (and maybe we are doing it for backwards compat). I think this is just a "row order is not guaranteed and that's fine" problem.

Jun 13 '25 15:06 wence-

I missed one set of errors previously that might be a distinct issue. https://github.com/rapidsai/cudf/blob/4466e55ba153ef13fa2327c3f0e00447aeb09b97/python/cudf_polars/tests/test_join.py#L91-L97 is a set of tests that does a join followed by a slice. That failing without a maintain_order (which we don't support) or a sort makes sense. But I was surprised by the error for pytest --executor=streaming --blocksize-mode=small python/cudf_polars/tests/test_join.py::test_left_join_with_slice[nulls_equal-zlice3]

AssertionError: DataFrames are different (number of rows does not match)
[left]:  2
[right]: 4

I would expect the same number of rows, but potentially different ones depending on which were selected. Maybe we're doing the slice on each partition and then concatenating each of those? Here's an isolated example:

import polars as pl
from cudf_polars.testing.asserts import assert_gpu_result_equal

left = pl.LazyFrame(
    {
        "a": [1, 2, 3, 1, None],
        "b": [1, 2, 3, 4, 5],
        "c": [2, 3, 4, 5, 6],
    }
)

right = pl.LazyFrame(
    {
        "a": [1, 4, 3, 7, None, None, 1],
        "c": [2, 3, 4, 5, 6, 7, 8],
        "d": [6, None, 7, 8, -1, 2, 4],
    }
)
q = left.join(right, on="a", how="inner").slice(0, 2)
engine = pl.GPUEngine(
    executor="streaming",
    raise_on_fail=True,
    executor_options={"max_rows_per_partition": 3},
)
assert_gpu_result_equal(q, engine=engine)

And the failure:

Traceback (most recent call last):
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/debug.py", line 21, in <module>
    assert_gpu_result_equal(q, engine=engine)
  File "/home/nfs/toaugspurger/gh/rapidsai/cudf/python/cudf_polars/cudf_polars/testing/asserts.py", line 134, in assert_gpu_result_equal
    assert_frame_equal(
  File "/raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 128, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/polars/testing/asserts/frame.py", line 101, in assert_frame_equal
    raise_assertion_error(
  File "/raid/toaugspurger/envs/rapidsai/cudf/25.08/lib/python3.12/site-packages/polars/testing/asserts/utils.py", line 12, in raise_assertion_error
    raise AssertionError(msg) from cause
AssertionError: DataFrames are different (number of rows does not match)
[left]:  2
[right]: 5

Jun 16 '25 19:06 TomAugspurger

Aha - Good catch. It makes sense that we can't assume correct results for a slice after a Join. However, we definitely shouldn't get the wrong row count. It looks like we need to register lower_ir_node logic for Slice. We also need to update the lower_ir_node logic to Join to separate the fused slice operation into a distinct IR node (we need to be careful with every IR node that can contain a "fused" slice operation I guess).

Jun 16 '25 20:06 rjzamora

Starting to work on a fix for the row-count problem: https://github.com/rapidsai/cudf/pull/19187

Jun 17 '25 18:06 rjzamora

Resolved in #19187

Jun 27 '25 00:06 vyasr