cub
cub copied to clipboard
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Cub uses testing facilities that are far from perfect. This PR contains Catch2 integration and a few convenience wrappers. Main advantages of Catch2: - Readable way of specifying cartesian products...
nvcc defaults to rdc-off, nvc++ defaults to rdc-on. We need to explicitly enable or disable these flags for each CUDA target, rather than just enabling them when needed.
The following describes a problem observed in more "complex" software projects, where different components (or libraries) use CUB and/or thrust without [separating CUB and/or thrust through namespace costumisation](https://github.com/NVIDIA/thrust/issues/1401). This issue...
Right now, the SpMV kernel allows to specify the matrix and vector type by specializing the `ValueT` type. In our case, our sparse matrix elements is stored in CRS format...
The `cub::DeviceSpmv` was unmaintained for a while and probably contains [bugs](https://github.com/NVIDIA/cub/pull/352#discussion_r680580812). Moreover, there are better implementations in specialized libraries like cuSPARSE. I suggest we deprecate it.
# Summary The user-friendly `cub::Device*` entry points into the CUB device algorithms assume that the problem size can be indexed with a 32-bit int. As evidenced by a slew of...
This PR briefly explains the current CUB design. The document is intended to help contributors. Coming PTX dispatch changes will lead to changes in this document. Having a diff of...
Currently, `cub::DeviceSegmentedSort` has a fallback kernel, that [apply](https://github.com/NVIDIA/cub/blob/0430cc0bfcb7c2496b42da754c215c9b5df8856b/cub/device/dispatch/dispatch_segmented_sort.cuh#L169) different algorithms for different segment sizes. In particular, medium-size segments are sorted by merge sort. If segment doesn't fit into registers, it's...