cupla icon indicating copy to clipboard operation
cupla copied to clipboard

add CUDA intrinsics

Open mxmlnkn opened this issue 10 years ago • 3 comments

I have code which makes use of warpSize and __shfl_down. The latter may be impossible to implement with alpaka, but warpSize could be mapped to something of value, e.g. elemDim.

mxmlnkn avatar Feb 28 '16 01:02 mxmlnkn

Which operation do you perform with warpsize? Warp is no layer of alpaka therefore you can use it with #ifdef on nvidia hardware and on all other hardware write your algorithm so that your warpsize is one.

psychocoderHPC avatar Feb 28 '16 07:02 psychocoderHPC

I'm doing __shfl_down reduction using warpSize. I guess it's very CUDA specific anyway

    /* reduce per warp (warpSize == 32 assumed) */
    int constexpr cWarpSize = 32;
    assert( cWarpSize == warpSize );
    #pragma unroll
    for ( int32_t warpDelta = cWarpSize / 2; warpDelta > 0; warpDelta /= 2)
        localReduced = f( localReduced, __shfl_down( localReduced, warpDelta ) );

    if ( threadIdx.x % cWarpSize == 0 )
        atomicFunc( rdpResult, localReduced, f );

mxmlnkn avatar Feb 28 '16 10:02 mxmlnkn

@psychocoderHPC should this one be closed?

sbastrakov avatar Feb 04 '20 12:02 sbastrakov