cub icon indicating copy to clipboard operation
cub copied to clipboard

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Results 91 cub issues
Sort by recently updated
recently updated
newest added

Using CUB device-wide primitives, I wrote an implementation of the algorithm described in "[Efficient Projections onto the ℓ1-Ball for Learning in High Dimensions](http://machinelearning.org/archive/icml2008/papers/361.pdf)" by John Duchi, Shai Shalev-Shwartz, Yoram Singer,...

type: enhancement
P2: nice to have

I couldn't find a better place to ask this question, so pardon me for asking here: Is there a way to use keys of non-standard types for cub::DeviceRadixSort? So far...

type: enhancement
P2: nice to have

I would like to be able to determine my device-wide primitive's temp_storage_bytes before I have all of the primitive's arguments ready. The interface for obtaining it ostensibly requires everything to...

type: enhancement
P1: should have
good first issue
area: docs

@dumerrill the library is awesome! All documented results are out of date and there's no results for Pascal generation. Probably you already have that code but not publicly available.

type: enhancement
P1: should have
area: docs

Prefix sums are incredibly useful, and CUB provides both inclusive and exclusive variants of device-wide and block-wide public APIs for these tools. However, the need for device-wide scans is not...

type: enhancement
P2: nice to have

For some algorithms it makes no sense to have a whole block of data in registers at once. For others a local buffer is bad due to dynamic indexing. For...

type: enhancement
P2: nice to have

There was reported an issue regarding the internal accumulator type in `cub::DeviceScan::InclusiveSum`. The issue consists in using input data type as accumulator type. Here's the reproducer: ```cuda #include #include int...

type: bug: functional
P1: should have

Fixes https://github.com/NVIDIA/cccl/issues/868

P3: backlog
helps: pytorch

An internal user has reported a bug in `cub::DeviceHistogram`. When using 16-bit values, the computed temporary storage buffer size is too small on Pascal, leading to runtime errors. They've applied...

type: bug: functional
P1: should have

See discussion in #294 and #305. The same change should be applied to `cub::DeviceReduce::Reduce`.

type: enhancement
P2: nice to have