[BUG]: Dtype mismatch between partitioned and non-partitioned aggregation with experimental streaming executor in some aggregations

Open TomAugspurger opened this issue 7 months ago • 0 comments

Describe the bug

This snippet produces a result with different dtypes using the streaming executor, depending on whether there's more than one partition.

Steps/Code to reproduce bug

import polars as pl

df = pl.LazyFrame({"a": [-10, 4, 5, 2, 3, 6, 8, 9, 4, 4, 5, 2, 3, 7, 3, 6, -10, -11]})
q = df.select(pl.col("a").n_unique())

print("cpu            :", q.collect().dtypes)
print("gpu-single     :", q.collect(engine=pl.GPUEngine(executor="streaming")).dtypes)
print("gpu-partitioned:", q.collect(engine=pl.GPUEngine(executor="streaming", executor_options={"max_rows_per_partition": 9})).dtypes)

Expected behavior

All outputs should match

Additional context

This is possibly a duplicate of https://github.com/rapidsai/cudf/issues/15852. But I'm opening a separate issue because I'm surprised to see a difference in the streaming exeuctor's output depending on whether theres one or multiple partitions.

Jun 12 '25 21:06 TomAugspurger