cub icon indicating copy to clipboard operation
cub copied to clipboard

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Results 91 cub issues
Sort by recently updated
recently updated
newest added

To give user some clue what's happening if the program gets compiled on a node with no GPU or if it gets compiled with different compute capability than the one...

Before porting to CUB, Thrust implementation of merge sort didn't use to have `*copy` version. When introducing `Copy` overload, I followed the CUB generic scheme of selecting output iterator value...

type: bug: functional

This PR addresses the following [issue](https://github.com/NVIDIA/cccl/issues/902) by replacing `__launch_bounds__` usages with `CUB_DETAIL_LAUNCH_BOUNDS`. `CUB_DETAIL_LAUNCH_BOUNDS` leads to `__launch_bounds__` usage only when RDC is **not** specified. Builds without RDC are not affected by...

testing: gpuCI in progress
type: bug: compiler

Specifying `__launch_bounds__` in the presence of RDC has proven to be troublesome and unreliable. We have to abstract it out so that launch bounds are not specified when RDC is...

nvbug
P0: must have

Currently, `BlockRadixRankMatchEarlyCounts` doesn't work in some specific cases `(1

Currently, we have a set of block radix rank facilities: - `BlockRadixRank` - `BlockRadixRankMatch` - `BlockRadixRankMatchEarlyCounts` There's also a `enum BlockScanAlgorithm` that describes the differences between these algorithms. Unlike the...

P3: backlog

https://github.com/nvidia/cub/blob/main/cub/block/block_reduce.cuh#L135 the image is for block_scan

## Current Situation As discussed in https://github.com/NVIDIA/cub/issues/545, CUB needs to query the current device's compute capability in order to know which tuning policy to use for launching the kernel. Currently,...

type: enhancement
P2: nice to have

I'd like to investigate implementing a reduction for associative, but non-commutative operations. Related to https://github.com/NVIDIA/thrust/issues/1434 The this kind of algorithm comes in handy when establishing the global context in parsing...