cuda-kat
cuda-kat copied to clipboard
CUDA kernel author's tools
This is a general problem of `nvcc` I would say: ```cpp #include #include constexpr int duzzle = -7; __global__ void kernel() { kat::array arr; arr.fill(duzzle); // fails to compile }...
While it's rarely a great idea, for the sake of completeness, we may want to have implementations of the Add abstract `` and `` algorithms which could be run by...
"Built-ins" in cuda-kat means those functions which translate into single PTX instructions (not necessarily single SASS instructions though!) We have `on_device/builtins.cuh`, and `on_device/non-builtins.cuh` which contains functions which are builtin-like, or...
We've already added support for some of the `` methods for accessing ranges, like `std::begin()` and `std::end()`. But - we haven't added any of their "reverse" variant,s e.g.` std::rbegin()` and...
Now that we have (half-)decent unit test coverage (see #24), we should introduce code coverage checks to see how much remains uncovered. This requires: * Getting a coverage-related CMake module...
Beginning with CUDA 10 (or maybe 9?) we have three kinds of atomics: * `atomicFoo()` - atomic w.r.t. other memory access from within the same GPU. * `atomicFoo_system()` - atomic...
An index [is](https://www.merriam-webster.com/dictionary/index) either a "list of items" arranged in order, or "a number... used as an indicator or measure", or "a number ... associated with another to indicate... position...
Shuffles are warp collaboration primitives. They should be in namespace `kat::collaboration::warp` - and declared in the warp collaboration primitives header - if only perhaps through an inclusion of another file.
We've adapted a tuple implementation; however, that tuple doesn't know that there's "another tuple" it needs to be compatible with... we _do_ know. So, let's try and make `kat::tuple` usable...
The programming guide [says](https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html): > **E.3.14.3. Rvalue references** > > By default, the CUDA compiler will implicitly consider `std::move` and `std::forward` function templates to have `__host__ __device__` execution space qualifiers,...