cub
cub copied to clipboard
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Using CUB device-wide primitives, I wrote an implementation of the algorithm described in "[Efficient Projections onto the ℓ1-Ball for Learning in High Dimensions](http://machinelearning.org/archive/icml2008/papers/361.pdf)" by John Duchi, Shai Shalev-Shwartz, Yoram Singer,...
I couldn't find a better place to ask this question, so pardon me for asking here: Is there a way to use keys of non-standard types for cub::DeviceRadixSort? So far...
I would like to be able to determine my device-wide primitive's temp_storage_bytes before I have all of the primitive's arguments ready. The interface for obtaining it ostensibly requires everything to...
@dumerrill the library is awesome! All documented results are out of date and there's no results for Pascal generation. Probably you already have that code but not publicly available.
Prefix sums are incredibly useful, and CUB provides both inclusive and exclusive variants of device-wide and block-wide public APIs for these tools. However, the need for device-wide scans is not...
For some algorithms it makes no sense to have a whole block of data in registers at once. For others a local buffer is bad due to dynamic indexing. For...
There was reported an issue regarding the internal accumulator type in `cub::DeviceScan::InclusiveSum`. The issue consists in using input data type as accumulator type. Here's the reproducer: ```cuda #include #include int...
Fixes https://github.com/NVIDIA/cccl/issues/868
An internal user has reported a bug in `cub::DeviceHistogram`. When using 16-bit values, the computed temporary storage buffer size is too small on Pascal, leading to runtime errors. They've applied...
See discussion in #294 and #305. The same change should be applied to `cub::DeviceReduce::Reduce`.