Allison Piper issues

Results 60 issues of


Allison Piper

Add release dates to changelog and readme

It would be convenient to make note of the release dates for each version in the table of releases in the README.md and the entries in CHANGELOG.md, both for Thrust...

type: enhancement

only: docs

P2: nice to have

Check for issues with -0 / +0 stable_sort issues

PR NVIDIA/cub#218 fixes this CUB's radix sort. We should: - [ ] Check whether Thrust's other backends handle this case correctly. - [ ] Provide a guarantee of this in...

type: bug: functional

P2: nice to have

good first issue

Unified benchmarking framework for Thrust + CUB

After #1184 is merged.

type: enhancement

area: performance

P0: must have

Simplify the CUB dependency

Symlinks cannot be used in a cross-platform project. I'd like to remove the `cub -> dependencies/cub` symlink and have `dependencies/cub` become the one and only location of CUB inside of...

P3: backlog

`cuda::std::` support in NVC++

Thrust and CUB would like to use libcu++. However, these projects must support the nvc++ compiler, so they are blocked from using libcu++ features until it is usable with nvc++.

enhancement

P0: must have

compiler: nvc++

helps: quda

Move thread scopes into a separate header

The `thread_scope` enum is [gated behind the `atomic` header](https://github.com/NVIDIA/libcudacxx/blob/bda0c48d46ff7d0e3d9dea3240426efe56db6bc7/include/cuda/std/detail/__atomic#L65-L70). The `atomic` header [emits errors when used with certain SM versions](https://github.com/NVIDIA/libcudacxx/blob/bda0c48d46ff7d0e3d9dea3240426efe56db6bc7/include/cuda/std/detail/__atomic#L9-L11). `thread_scope` is useful outside of atomics for general scope labeling....

enhancement

P2: nice to have

Fix RDC flags on nvc++ builds.

nvcc defaults to rdc-off, nvc++ defaults to rdc-on. We need to explicitly enable or disable these flags for each CUDA target, rather than just enabling them when needed.

only: cmake

blocked

type: bug: functional

compiler: nvc++

P0: must have

helps: nvc++

area: cmake

Transparent support for 64-bit indexing in device algorithms

# Summary The user-friendly `cub::Device*` entry points into the CUB device algorithms assume that the problem size can be indexed with a 32-bit int. As evidenced by a slew of...

type: bug: functional

P0: must have

helps: quda

Implement `ptx_dispatch` for if-target compatible target specialization

type: enhancement

testing: gpuCI in progress

P0: must have

release: breaking change

Address issues with existing determinism guarantees

# Overview Some `cub::Device*` algorithms are/were documented to be run-to-run deterministic, but the implementations no longer fulfill that guarantee. This has been a major pain point for several users who...

type: bug: functional

P0: must have

area: performance

helps: pytorch

release: breaking change

type: cleanup

area: docs

area: tests