cuda-kat Specialize functions with many reads/writes for sub-4-byte element types

Specialize functions with many reads/writes for sub-4-byte element types

Open eyalroz opened this issue 4 years ago • 0 comments

We have many templated functions which make a (potentially) large number of reads or writes to memory, and therefore benefit from coalescing their memory operations. However, most, if not all of them are not specialized for element types below 4 bytes long, and are therefore slower than they might have been. Examples include copying, filling, appending to global memory etc.

We should add specializations for these cases.

Feb 28 '20 17:02 eyalroz

cuda-kat cuda-kat copied to clipboard

Specialize functions with many reads/writes for sub-4-byte element types

cuda-kat
cuda-kat copied to clipboard