cub icon indicating copy to clipboard operation
cub copied to clipboard

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Results 91 cub issues
Sort by recently updated
recently updated
newest added

Cub uses testing facilities that are far from perfect. This PR contains Catch2 integration and a few convenience wrappers. Main advantages of Catch2: - Readable way of specifying cartesian products...

testing: gpuCI passed
P2: nice to have

nvcc defaults to rdc-off, nvc++ defaults to rdc-on. We need to explicitly enable or disable these flags for each CUDA target, rather than just enabling them when needed.

only: cmake
blocked
type: bug: functional
compiler: nvc++
P0: must have
helps: nvc++
area: cmake

The following describes a problem observed in more "complex" software projects, where different components (or libraries) use CUB and/or thrust without [separating CUB and/or thrust through namespace costumisation](https://github.com/NVIDIA/thrust/issues/1401). This issue...

type: bug: functional
P1: should have

Right now, the SpMV kernel allows to specify the matrix and vector type by specializing the `ValueT` type. In our case, our sparse matrix elements is stored in CRS format...

The `cub::DeviceSpmv` was unmaintained for a while and probably contains [bugs](https://github.com/NVIDIA/cub/pull/352#discussion_r680580812). Moreover, there are better implementations in specialized libraries like cuSPARSE. I suggest we deprecate it.

release: breaking change

# Summary The user-friendly `cub::Device*` entry points into the CUB device algorithms assume that the problem size can be indexed with a 32-bit int. As evidenced by a slew of...

type: bug: functional
P0: must have
helps: quda

type: enhancement
testing: gpuCI in progress
P0: must have
release: breaking change

This PR briefly explains the current CUB design. The document is intended to help contributors. Coming PTX dispatch changes will lead to changes in this document. Having a diff of...

only: docs
area: docs

Currently, `cub::DeviceSegmentedSort` has a fallback kernel, that [apply](https://github.com/NVIDIA/cub/blob/0430cc0bfcb7c2496b42da754c215c9b5df8856b/cub/device/dispatch/dispatch_segmented_sort.cuh#L169) different algorithms for different segment sizes. In particular, medium-size segments are sorted by merge sort. If segment doesn't fit into registers, it's...

P2: nice to have