cccl
cccl copied to clipboard
Refactor `thrust::reduce` to use `cub::DeviceReduce`
This is a sub-task of Thrust/CUB kernel consolidation https://github.com/NVIDIA/cccl/issues/26
Prepare cub::DeviceReduce
for feature parity needed by thrust::reduce
:
- [ ] Introduce vsmem utility to
cub::DeviceReduce
- [ ] Add tests to CUB that check that
cub::DeviceReduce
correctly uses the fallback policy (see https://github.com/NVIDIA/cccl/pull/1379/commits/fdf565e6cc063103643ea0e964b2437400721c5e) - [ ] Add tests to CUB that check that
cub::DeviceReduce
correctly uses virtual shared memory (see https://github.com/NVIDIA/cccl/pull/1379/commits/fdf565e6cc063103643ea0e964b2437400721c5e)
Refactor thrust::reduce
to use cub::DeviceReduce
:
- [ ] Make
thrust::reduce
usecub::DeviceReduce
(see https://github.com/NVIDIA/cccl/pull/1379/commits/948817ed034a1f704433e4c5e13444e0b9a75106) - [ ] Add dynamic 32/64-bit offset type-dispatch to
thrust::reduce
(see 948817e L210-216) - [ ] Add sanity tests for large number of items for
thrust::reduce
(see https://github.com/NVIDIA/cccl/pull/1379/commits/01f32ddb50c5175154b336af83446ab1dfe8b12a) - [ ] Add more elaborate testing for
cub::DeviceReduce
(see https://github.com/NVIDIA/cccl/pull/1612)