cuda-kat icon indicating copy to clipboard operation
cuda-kat copied to clipboard

CUDA kernel author's tools

Results 65 cuda-kat issues
Sort by recently updated
recently updated
newest added

`atomic::compare_and_swap()` is not covered by the unit tests right now. We need to test it. Also, consider the insights here: https://stackoverflow.com/questions/62091548/atomiccas-for-bool-implementatin

My unit tests for `kat::tuple` `kat::span` and ``kat::array` don't include any case for passing these structures as arguments to a kernel! That needs to be rectified...

It would be useful to have a more C++'ish mechanism to obtain random numbers.

enhancement

Copying or moving data has (at least) the following variants: 1. Element size in bytes 1.1 size is a power of 2 1.2 size is a natively-supported power of 2...

Task

I wonder if we should consider a version of `append_to_global_memory()` where each thread may have its data elsewhere (at an address); and perhaps also a version where each thread has...

question

The `have_a_single_lane_compute` primitive currently returns a value. But - this value is only valid for the single computing lane, and the caller doesn't even know which lane that is. That...

Task

I mis-designed the reduction and scan functions to take a neutral value as a template parameter; this works for integral types, not for floating-point types (until C++20). For the time...

bug
fixed on development

When compiling for compute capability 3.0 cards, we get failures due to some instructions not being supported. Let's work around this.

bug
fixed on development

Need to correct a typo as follows: - __nanosleep(unsigned int ns); + __nanosleep(num_cycles); in `time.cuh`. Detected with compiling with CC 7.x

bug
fixed on development

The four functions, `__isGlobal()` , `__isLocal()`, `__isShared()`, `__isConstant()`, are available from CUDA (as of 10.2 it seems) - and we don't need to defined them ourselves under `on_device/ptx/`. Probably.

Task