cccl
cccl copied to clipboard
[DOC]: Add CUB examples utilizing multi-dimensional thread blocks
Is this a duplicate?
- [X] I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Is this for new documentation, or an update to existing docs?
Update
Describe the incorrect/future/missing documentation
Applying CUB block algorithms in multi-dimensional thread blocks is a constant source of confusion for our users. Most of our examples illustrate usage of block-level algorithms in a single-dimensional thread blocks. As discussed in https://github.com/NVIDIA/cccl/discussions/1653, we should extend our documentation with examples of multi-dimensional cases:
### Tasks
- [ ] block adjacent difference
- [ ] block discontinuity
- [ ] block exchange
- [ ] block histogram
- [ ] block load
- [ ] block merge sort
- [ ] block radix sort
- [ ] block reduce
- [ ] block rld
- [ ] block scan
- [ ] block shuffle
- [ ] block store
If this is a correction, please provide a link to the incorrect documentation. If this is a new documentation request, please link to where you have looked.
No response