cub issues

Draft for catch2 testing framework usage

5

Cub uses testing facilities that are far from perfect. This PR contains Catch2 integration and a few convenience wrappers. Main advantages of Catch2: - Readable way of specifying cartesian products...

gevtushenko

testing: gpuCI passed

P2: nice to have

Fix RDC flags on nvc++ builds.

1

nvcc defaults to rdc-off, nvc++ defaults to rdc-on. We need to explicitly enable or disable these flags for each CUDA target, rather than just enabling them when needed.

alliepiper

only: cmake

blocked

type: bug: functional

compiler: nvc++

P0: must have

helps: nvc++

area: cmake

Dispatch mechanism may break when any two libraries that use CUB and/thrust have been compiled for different set of GPU architectures

6

The following describes a problem observed in more "complex" software projects, where different components (or libraries) use CUB and/or thrust without [separating CUB and/or thrust through namespace costumisation](https://github.com/NVIDIA/thrust/issues/1401). This issue...

elstehle

type: bug: functional

P1: should have

SpMV with different matrix and vector types

3

Right now, the SpMV kernel allows to specify the matrix and vector type by specializing the `ValueT` type. In our case, our sparse matrix elements is stored in CRS format...

michaelmigliore

Deprecate `cub::DeviceSpmv`

The `cub::DeviceSpmv` was unmaintained for a while and probably contains [bugs](https://github.com/NVIDIA/cub/pull/352#discussion_r680580812). Moreover, there are better implementations in specialized libraries like cuSPARSE. I suggest we deprecate it.

gevtushenko

release: breaking change

Transparent support for 64-bit indexing in device algorithms

15

# Summary The user-friendly `cub::Device*` entry points into the CUB device algorithms assume that the problem size can be indexed with a 32-bit int. As evidenced by a slew of...

alliepiper

type: bug: functional

P0: must have

helps: quda

Refactor Thrust/CUB dispatch mechanisms to not rely on `__CUDA_ARCH__`

10

brycelelbach

type: enhancement

compiler: nvc++

P0: must have

Implement `ptx_dispatch` for if-target compatible target specialization

1

alliepiper

type: enhancement

testing: gpuCI in progress

P0: must have

release: breaking change

First version of developer overview

This PR briefly explains the current CUB design. The document is intended to help contributors. Coming PTX dispatch changes will lead to changes in this document. Having a diff of...

gevtushenko

only: docs

area: docs

Refine fallback kernel for segmented sort

1

Currently, `cub::DeviceSegmentedSort` has a fallback kernel, that [apply](https://github.com/NVIDIA/cub/blob/0430cc0bfcb7c2496b42da754c215c9b5df8856b/cub/device/dispatch/dispatch_segmented_sort.cuh#L169) different algorithms for different segment sizes. In particular, medium-size segments are sorted by merge sort. If segment doesn't fit into registers, it's...

gevtushenko

P2: nice to have

cub
cub copied to clipboard

Metadata

Draft for catch2 testing framework usage

Fix RDC flags on nvc++ builds.

Dispatch mechanism may break when any two libraries that use CUB and/thrust have been compiled for different set of GPU architectures

SpMV with different matrix and vector types

Deprecate `cub::DeviceSpmv`

Transparent support for 64-bit indexing in device algorithms

Refactor Thrust/CUB dispatch mechanisms to not rely on `__CUDA_ARCH__`

Implement `ptx_dispatch` for if-target compatible target specialization

First version of developer overview

Refine fallback kernel for segmented sort

← Metadata

Owner

Metadata

cub cub copied to clipboard

Metadata

← Metadata

Owner

Metadata

cub
cub copied to clipboard