Allison Piper

Results 60 issues of Allison Piper

It would be convenient to make note of the release dates for each version in the table of releases in the README.md and the entries in CHANGELOG.md, both for Thrust...

type: enhancement
only: docs
P2: nice to have

PR NVIDIA/cub#218 fixes this CUB's radix sort. We should: - [ ] Check whether Thrust's other backends handle this case correctly. - [ ] Provide a guarantee of this in...

type: bug: functional
P2: nice to have
good first issue

After #1184 is merged.

type: enhancement
area: performance
P0: must have

Symlinks cannot be used in a cross-platform project. I'd like to remove the `cub -> dependencies/cub` symlink and have `dependencies/cub` become the one and only location of CUB inside of...

P3: backlog

Thrust and CUB would like to use libcu++. However, these projects must support the nvc++ compiler, so they are blocked from using libcu++ features until it is usable with nvc++.

enhancement
P0: must have
compiler: nvc++
helps: quda

The `thread_scope` enum is [gated behind the `atomic` header](https://github.com/NVIDIA/libcudacxx/blob/bda0c48d46ff7d0e3d9dea3240426efe56db6bc7/include/cuda/std/detail/__atomic#L65-L70). The `atomic` header [emits errors when used with certain SM versions](https://github.com/NVIDIA/libcudacxx/blob/bda0c48d46ff7d0e3d9dea3240426efe56db6bc7/include/cuda/std/detail/__atomic#L9-L11). `thread_scope` is useful outside of atomics for general scope labeling....

enhancement
P2: nice to have

nvcc defaults to rdc-off, nvc++ defaults to rdc-on. We need to explicitly enable or disable these flags for each CUDA target, rather than just enabling them when needed.

only: cmake
blocked
type: bug: functional
compiler: nvc++
P0: must have
helps: nvc++
area: cmake

# Summary The user-friendly `cub::Device*` entry points into the CUB device algorithms assume that the problem size can be indexed with a 32-bit int. As evidenced by a slew of...

type: bug: functional
P0: must have
helps: quda

type: enhancement
testing: gpuCI in progress
P0: must have
release: breaking change

# Overview Some `cub::Device*` algorithms are/were documented to be run-to-run deterministic, but the implementations no longer fulfill that guarantee. This has been a major pain point for several users who...

type: bug: functional
P0: must have
area: performance
helps: pytorch
release: breaking change
type: cleanup
area: docs
area: tests