cuda-kat icon indicating copy to clipboard operation
cuda-kat copied to clipboard

CUDA kernel author's tools

Results 65 cuda-kat issues
Sort by recently updated
recently updated
newest added

While this repository is C++-oriented - it is still useful to provide some C standard library functions, some of which are already available. Missing are the `printf()`-family of functions: Both...

enhancement
fixed on development

Yes, you knew it had to happen at some point... one of the ugliest, wartiest class in C++, which you would have like to just replace with: ``` template struct...

enhancement
Task

I've been having trouble compiling my test programs with NVCC 11.x; see [this SO question](https://stackoverflow.com/questions/69018930/nvcc-wont-compile-perfect-forwarding-with-restrict-and-parameter-packs?noredirect=1#comment121982267_69018930). A workaround for the problem is dropping all of the `__restrict__` qualifiers. We don't really...

bug

Some versions of NVCC complain about unused lock_guard's. Perhaps they do nothing on the device side? Perhaps it's an NVCC bug? At any rate, we can overcome this issue with...

Task

It is not possible to naively use a `__device__`-function's pointer in host-side code. However, it is possible to use it if you copy its address from a global device-side variable...

enhancement
question

Currently, we offer the `at_grid_stride()`, `at_block_stride()` and `at_warp_stride()` functions, which take an invokable and ensure the appropriate traversal pattern is used. Would it not be a good idea to offer,...

question
fixed on development

Functions like `is_first()`, `is_last()` etc. in `on_device/grid_info.cuh` should return a boolean value, not an unsigned integer, which they currently do.

bug
fixed on development

Using grid_info is cumbersome enough to defeat the purpose of these one-liner utility functions in the first place. While removing it may cause a bit of ambiguity in other namespace...

Task
fixed on development

I have perhaps been overly generous in placing functionality within sub-namespaces. After all, everything is under `kat::` already. As long as there is no chance of collisions, a lot of...

A linear CUDA grid can have 2^31-1 blocks (in the x dimension), each of size 1024 elements, for a total of a little under 2^41 threads. Currently, our types and...

bug
question