cudf Replace thrust reductions in Parquet reader with CUB + pinned memory based implementations

Description

Contributes to #20722

This PR replaces the use of thrust::reduce and thrust::transform_reduce functions that incur a pageable D2H copy with custom CUB + pinned memory based implementations.

Checklist

[x] I am familiar with the Contributing Guidelines.
[x] New or existing tests cover these changes.
[x] The documentation is up to date with these changes.

Dec 10 '25 00:12 mhaseeb123

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Dec 10 '25 00:12 copy-pr-bot[bot]

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Dec 10 '25 01:12 copy-pr-bot[bot]

pre-commit.ci autofix

Dec 10 '25 01:12 mhaseeb123

Another request: please write a CCCL feature request for Thrust to support returning host values via pinned memory. It will help them to know that returning pageable memory + syncing is a pain point for us, even if we don't choose to adopt it in the near/medium term.

Dec 10 '25 18:12 bdice

Ran Parquet reader benchmarks (single, multithreaded and filters) and saw no change in performance with this PR

Dec 12 '25 20:12 mhaseeb123

/ok to test 50dd93e

Dec 12 '25 20:12 mhaseeb123

/merge

Dec 12 '25 23:12 mhaseeb123