libcudacxx icon indicating copy to clipboard operation
libcudacxx copied to clipboard

[RFE] std::experimental::fixed_size_simd and arithmetic operators

Open kerrmudgeon opened this issue 2 years ago • 0 comments

This requests std::experimental::fixed_size_simd and arithmetic operators to be added to libcu++.

This would result in a unified, portable exposure of operations that are accelerated by vector operations in NVIDIA GPU hardware, including:

  • elementwise multiply
  • elementwise add
  • elementwise multiply-add (e.g. half <= half * half + half, bfloat16 <= bfloat16 * bfloat16 + bfloat16).
  • convert-and-pack (int8 <= float, half <= float, bfloat16 <= float)
  • transcendental functions available in the CUDA Math Library (e.g. sqrt, exp, tanh, log, log10, erf, sin, cos, tan)

CUTLASS currently implements the above as partial specializations of operators and extensions. This is non-standard. CUTLASS would drop its implementation and use libcu++'s implementation when it becomes available.

kerrmudgeon avatar Dec 14 '21 22:12 kerrmudgeon