Jake Hemstad comments

Results 209 comments of


Jake Hemstad

Dispatch mechanism may break when any two libraries that use CUB and/thrust have been compiled for different set of GPU architectures

> If there's multiple choices for a kernel, the CUDA runtime seems to choose any qualifying kernel candidate "at random". Let me make sure I'm following what's going on here....

Dispatch mechanism may break when any two libraries that use CUB and/thrust have been compiled for different set of GPU architectures

This piqued my curiosity and I went far down a rabbit hole. TL;DR: There is something extremely odd going on here that I don't understand and just making the kernel...

`CachingDeviceAllocator` : Reusing allocations is not safe with per-thread implicit streams

Yep, we ran into this in RMM a while back: https://github.com/rapidsai/rmm/issues/410 You might consider using `cudaMallocAsync` instead.

Transparent support for 64-bit indexing in device algorithms

> It is certainly true that adding int64_t instantiations increases compile time, and that they come with a non-trivial performance penalty. In pytorch land we are working around both these...

Transparent support for 64-bit indexing in device algorithms

Couldn't the index type also be inferred from the `std::iterator_traits::difference_type`? That might annoy existing users passing in raw pointers, that's going to default to 64 bit indices (`ptrdiff_t`). CUB could...

Transparent support for 64-bit indexing in device algorithms

> Use unsigned offsets instead of signed -- since we'll be porting these algorithms incrementally, we can afford to spend some time fixing any issues that arise from the change...

Adds DeviceBatchMemcpy algorithm and tests

> I'm all in for using existing building blocks. The problem is that I didn't assume the pointers to be aligned and so had to devise special treatment to be...

Draft for catch2 testing framework usage

Huge +1 from me. I experimented with using Catch2 in [cuCollections ](https://github.com/NVIDIA/cuCollections/tree/dev/tests) and I have loved it. I know CUB doesn't currently use GTest, but many of my favorite features...

Draft for catch2 testing framework usage

> Have you encountered an issue related to Catch2 usage in `.cu` files? All the test files in cuCollections are `.cu` files: https://github.com/NVIDIA/cuCollections/tree/dev/tests The is one warning that's generated when...

`noinline` Macro Definition causes `clang++` Compile Error

Indeed, I believe the nvcc frontend has special handling for that attribute expansion. clang would need to emulate that "special" handling :slightly_smiling_face: