cub issues

Better error message for no GPU or incompatible GPU

3

To give user some clue what's happening if the program gets compiled on a node with no GPU or if it gets compiled with different compute capability than the one...

zkhatami

Merge sort key type selection

2

Before porting to CUB, Thrust implementation of merge sort didn't use to have `*copy` version. When introducing `Copy` overload, I followed the CUB generic scheme of selecting output iterator value...

gevtushenko

type: bug: functional

Wrap launch bounds

5

This PR addresses the following [issue](https://github.com/NVIDIA/cccl/issues/902) by replacing `__launch_bounds__` usages with `CUB_DETAIL_LAUNCH_BOUNDS`. `CUB_DETAIL_LAUNCH_BOUNDS` leads to `__launch_bounds__` usage only when RDC is **not** specified. Builds without RDC are not affected by...

gevtushenko

testing: gpuCI in progress

type: bug: compiler

New indirection level for launch bounds

1

Specifying `__launch_bounds__` in the presence of RDC has proven to be troublesome and unreliable. We have to abstract it out so that launch bounds are not specified when RDC is...

gevtushenko

nvbug

P0: must have

Fix `BlockRadixRankMatchEarlyCounts` or constrain it

Currently, `BlockRadixRankMatchEarlyCounts` doesn't work in some specific cases `(1

gevtushenko

Fix or hide `BlockRadixRank` and friends

Currently, we have a set of block radix rank facilities: - `BlockRadixRank` - `BlockRadixRankMatch` - `BlockRadixRankMatchEarlyCounts` There's also a `enum BlockScanAlgorithm` that describes the differences between these algorithms. Unlike the...

gevtushenko

P3: backlog

Wrong Image

https://github.com/nvidia/cub/blob/main/cub/block/block_reduce.cuh#L135 the image is for block_scan

chengscott

[DISCUSSION] Alternative solution for determining compute capability at runtime

1

## Current Situation As discussed in https://github.com/NVIDIA/cub/issues/545, CUB needs to query the current device's compute capability in order to know which tuning policy to use for launching the kernel. Currently,...

jrhemstad

type: enhancement

P2: nice to have

Implementing reduction for non-commutative operations

3

I'd like to investigate implementing a reduction for associative, but non-commutative operations. Related to https://github.com/NVIDIA/thrust/issues/1434 The this kind of algorithm comes in handy when establishing the global context in parsing...

upsj

Initial port to new monorepo build system.

1

alliepiper

cub
cub copied to clipboard

Metadata

Better error message for no GPU or incompatible GPU

Merge sort key type selection

Wrap launch bounds

New indirection level for launch bounds

Fix `BlockRadixRankMatchEarlyCounts` or constrain it

Fix or hide `BlockRadixRank` and friends

Wrong Image

[DISCUSSION] Alternative solution for determining compute capability at runtime

Implementing reduction for non-commutative operations

Initial port to new monorepo build system.

← Metadata

Owner

Metadata

cub cub copied to clipboard

Metadata

← Metadata

Owner

Metadata

cub
cub copied to clipboard