cutlass [FEA] BFloat16x2 Atomics

[FEA] BFloat16x2 Atomics

Open HanGuo97 opened this issue 7 months ago • 1 comments

Currently, CUTLASS only implements a specialization of atomic_add for half2, but not nv_bfloat162. This in turn limits BlockStripedReduce to specialize in half2 but not nv_bfloat162.

Is there any reason not to provide a specialization for nv_bfloat162? It looks like a very simple change, but maybe I'm missing something. Thanks in advance for the help!

Jul 04 '24 20:07 HanGuo97

cutlass cutlass copied to clipboard

[FEA] BFloat16x2 Atomics

cutlass
cutlass copied to clipboard