RAJA No support for 16 bit atomics

I'd like to use RAJA's atomics with on CUDA's 16 bit half precision values. Currently the implementation only supports 32 and 64 bit values.

Apr 08 '21 18:04 corbett5

This is true, it's not a completely straightforward fix though sadly. The hardware doesn't support 16bit atomics, so it's something we have to build on. The upcoming re-work to the RAJA atomics will pull in desul, which supports this, but at the moment it falls all the way back to using the generic sharded lock table approach to implement it.

That would get the functionality working, and is part of why we went to pretty great lengths to include that support so any arbitrary size will work, it's computationally expensive and potentially contentious. If there's any way to avoid having to do atomics directly on that size, I would recommend it. If you need to support them now, and you can tolerate C++14, give desul a try as it should work for you right away. We could use some more user experience outside of RAJA and Kokkos tests.

Apr 08 '21 21:04 trws

Well I don't know if the hardware supports it or not but they have a function. https://docs.nvidia.com/cuda/archive/10.1/cuda-c-programming-guide/index.html#atomicadd

I got it working (I think, still ironing out other issues) with this hack

#if defined( __CUDACC__ )

namespace RAJA
{

template< typename ATOMIC_TYPE >
inline __device__ std::enable_if_t< std::is_same< ATOMIC_TYPE, cuda_atomic >::value, __half >
atomicAdd( __half * const address, __half const value )
{ return ::atomicAdd( address, value ); }

}
#endif

Apr 08 '21 21:04 corbett5

Interesting, and thanks. This is probably worth adding in as a one-off, though we'll need to protect it for only compute capability 7+. I wonder what customer convinced them to do this. It's pretty telling that they support only the Add operation for a half-sized float type, not even exchange or CAS.

Apr 08 '21 22:04 trws