libcudacxx
libcudacxx copied to clipboard
[RFE] std::experimental::fixed_size_simd and arithmetic operators
This requests std::experimental::fixed_size_simd
and arithmetic operators to be added to libcu++.
This would result in a unified, portable exposure of operations that are accelerated by vector operations in NVIDIA GPU hardware, including:
- elementwise multiply
- elementwise add
- elementwise multiply-add (e.g. half <= half * half + half, bfloat16 <= bfloat16 * bfloat16 + bfloat16).
- convert-and-pack (int8 <= float, half <= float, bfloat16 <= float)
- transcendental functions available in the CUDA Math Library (e.g. sqrt, exp, tanh, log, log10, erf, sin, cos, tan)
CUTLASS currently implements the above as partial specializations of