cuda-kat icon indicating copy to clipboard operation
cuda-kat copied to clipboard

Specialize functions with many reads/writes for sub-4-byte element types

Open eyalroz opened this issue 4 years ago • 0 comments

We have many templated functions which make a (potentially) large number of reads or writes to memory, and therefore benefit from coalescing their memory operations. However, most, if not all of them are not specialized for element types below 4 bytes long, and are therefore slower than they might have been. Examples include copying, filling, appending to global memory etc.

We should add specializations for these cases.

eyalroz avatar Feb 28 '20 17:02 eyalroz