cuda-kat icon indicating copy to clipboard operation
cuda-kat copied to clipboard

CUDA kernel author's tools

Results 65 cuda-kat issues
Sort by recently updated
recently updated
newest added

This is a general problem of `nvcc` I would say: ```cpp #include #include constexpr int duzzle = -7; __global__ void kernel() { kat::array arr; arr.fill(duzzle); // fails to compile }...

question

While it's rarely a great idea, for the sake of completeness, we may want to have implementations of the Add abstract `` and `` algorithms which could be run by...

question

"Built-ins" in cuda-kat means those functions which translate into single PTX instructions (not necessarily single SASS instructions though!) We have `on_device/builtins.cuh`, and `on_device/non-builtins.cuh` which contains functions which are builtin-like, or...

Task

We've already added support for some of the `` methods for accessing ranges, like `std::begin()` and `std::end()`. But - we haven't added any of their "reverse" variant,s e.g.` std::rbegin()` and...

Task

Now that we have (half-)decent unit test coverage (see #24), we should introduce code coverage checks to see how much remains uncovered. This requires: * Getting a coverage-related CMake module...

Task

Beginning with CUDA 10 (or maybe 9?) we have three kinds of atomics: * `atomicFoo()` - atomic w.r.t. other memory access from within the same GPU. * `atomicFoo_system()` - atomic...

enhancement
Task

An index [is](https://www.merriam-webster.com/dictionary/index) either a "list of items" arranged in order, or "a number... used as an indicator or measure", or "a number ... associated with another to indicate... position...

question

Shuffles are warp collaboration primitives. They should be in namespace `kat::collaboration::warp` - and declared in the warp collaboration primitives header - if only perhaps through an inclusion of another file.

Task

We've adapted a tuple implementation; however, that tuple doesn't know that there's "another tuple" it needs to be compatible with... we _do_ know. So, let's try and make `kat::tuple` usable...

Task

The programming guide [says](https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html): > **E.3.14.3. Rvalue references** > > By default, the CUDA compiler will implicitly consider `std::move` and `std::forward` function templates to have `__host__ __device__` execution space qualifiers,...

question