cuda-kat icon indicating copy to clipboard operation
cuda-kat copied to clipboard

CUDA kernel author's tools

Results 65 cuda-kat issues
Sort by recently updated
recently updated
newest added

Most code in the library is currently not covered by any unit tests. Let's add that coverage. * [x] src/kat/containers/span.hpp * [x] src/kat/containers/array.hpp * [x] src/kat/containers/tuple.hpp * [x] src/kat/on_device/time.cuh *...

Task

We need most, if not all, of the functions in [`` and ``](https://en.cppreference.com/w/cpp/algorithm) available on the device, for execution at the warp and block level. (But not the uninitialized memory...

enhancement
Task

Some of our code behaves differently with differing C++ standard version - including potentially some code with explicit `#if __cplusplus` conditions. We should therefore make sure and have all unit...

I tried to use `cuda-kat` with c++17, which fails to compile. The reproduction should be easy, because already the include of the header causes compilation issues.

bug
fixed on development

We currently have a non-functioning `kat::apply` for tuples - since there's no `kat::invoke`. Let's implement the latter to allow for the former.

enhancement

When compiling with C++17, static assertions regarding swappability with non-swappable and with non-immutable types are applied; and they fail, or rather - one fails and the other doesn't, and it...

bug

If you put these lines: ```cmake # If you want to use the library without installing it, you'll need to # copy the file we generate here, cuda-kat-config.cmake, into the...

The `CMakeLists.txt` required `strf` with version `0.10.4`, but that version is not available [here](https://github.com/robhz786/strf/tags)

fixed on development

Unfortunately, it seems our lambda-based manipulators are implicitly returning a copy of, rather than a reference to, the ostream they got - due to the implicit return type deduction. In...

bug
fixed on development

CUDA offers many functions: https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SIMD.html for working with multiple 1-byte and 2-byte values packed into the native 4-byte integers. We should offer both explicit access to these, which would be...

Task