cuda-kat
cuda-kat copied to clipboard
Support the "SIMD"-like intrinsics
CUDA offers many functions:
https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SIMD.html
for working with multiple 1-byte and 2-byte values packed into the native 4-byte integers.
We should offer both explicit access to these, which would be better structured and not a heap of idiosyncratic names (perhaps via the kat::array
type? some other way?)
We should also check our existing code, to see when specializations are in order which would ensure we benefit from these instructions (e.g. in sequence operations or collaboration primitives).