cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Require using rapidsmpf Stream Pool with rapidsmpf runtime

Open TomAugspurger opened this issue 1 month ago • 0 comments

Spotted in https://github.com/rapidsai/cudf/pull/20662#discussion_r2578159439, rapidsmpf's native read_parquet node will produce data that's stream ordered on some CUDA stream from rapidsmpf's stream pool. It's not clear to me how this interacts with cudf-polars non-pool CUDAStreamPolicy options ("new", or "default"): we would need to ensure that the data from rapidsmpf's native nodes are synchronized with the stream we attach to the dataframe.

I'd recommend just requiring that the rapidsmpf runtime uses the rapidsmpf stream pool (erroring when creating the ConfigOptions from the polars engine if not).

TomAugspurger avatar Dec 01 '25 22:12 TomAugspurger