[HIP][device] 4 __shfl_sync functions are missing
CUDA 9 __shfl_sync function is missing. I can use the deprecated __shfl but it would be
be better to have the new function.
Test code:
__global__
static void shflTest(int lid){
int tid = threadIdx.x;
float value = tid + 0.1f;
int* ivalue = reinterpret_cast<int*>(&value);
//use the integer shfl
int ix = __shfl(ivalue[0],5,32);
int iy = __shfl_sync(0xFFFFFFFF, ivalue[0],5,32);
float x = reinterpret_cast<float*>(&ix)[0];
float y = reinterpret_cast<float*>(&iy)[0];
if(tid == lid){
printf("shfl tmp %d %d\n",ix,iy);
printf("shfl final %f %f\n",x,y);
}
}
int main()
{
shflTest<<<1,32>>>(0);
cudaDeviceSynchronize();
return 0;
}
__shfl_up_sync, __shfl_down_sync, and __shfl_xor_sync as well.
We have some work left in the device compiler to support certain cuda 9 device side features such as the sync APIs. Also note that most AMD devices have a "warp size" of 64, so any code using a 32 bit mask is already broken.
Hi, I was trying to hipify a code and there are a few calls to __shfl_down_sync. The __shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?
@gmarkomanolis What I do when using hipify-perl as part of a build process is include a construction like,
#ifdef __HIP_PLATFORM_HCC__
#define SHFL_DOWN(val, offset) __shfl_down(val, offset)
#else
#define SHFL_DOWN(val, offset) __shfl_down_sync(0xffffffff, val, offset)
#endif
The specific constant I'm using there (__HIP_PLATFORM_HCC) is old, so a newer one would be better.
Hi, I was trying to hipify a code and there are a few calls to __shfl_down_sync. The __shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?
__shfl_down is deprecated since CUDA 9.0, but it is not removed and still can be used even by CUDA 11.2.1.
The specific constant I'm using there (
__HIP_PLATFORM_HCC) is old, so a newer one would be better.
What do you mean by old?
I think mentions of hcc are being removed over time.
Hi, I was trying to hipify a code and there are a few calls to __shfl_down_sync. The __shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?
__shfl_downis deprecated sinceCUDA 9.0, but it is not removed and still can be used even byCUDA 11.2.1.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions
Deprecation Notice: __shfl, __shfl_up, __shfl_down, and __shfl_xor have been deprecated in CUDA 9.0 for all devices.
Removal Notice: When targeting devices with compute capability 7.x or higher, __shfl, __shfl_up, __shfl_down, and __shfl_xor are no longer available and their sync variants should be used instead.
I will check though if it is on the code's side as it is not mine. Thanks for the answer.
@gmarkomanolis What I do when using
hipify-perlas part of a build process is include a construction like,#ifdef __HIP_PLATFORM_HCC__ #define SHFL_DOWN(val, offset) __shfl_down(val, offset) #else #define SHFL_DOWN(val, offset) __shfl_down_sync(0xffffffff, val, offset) #endifThe specific constant I'm using there (
__HIP_PLATFORM_HCC) is old, so a newer one would be better.
Thanks a lot.
Hey, @emankov, any update on __shfl_sync ? It would be great to have this implemented I think.
Any update on this? I am specifically looking for a solution to __shfl_sync
If your code uses a mask of 0xffffffff, then you can just replace your _sync calls with the non-sync ones and it should work fine.
@Kaveh01 Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks!
@ppanchad-amd You could have just said that the _sync functions were added to the C++ kernel language in some ROCm/HIP version :shrug:
I am using rocm 6.1.3 yet I still keep getting this issue. "error: use of undeclared identifier '__shfl_down_sync'"
The *_sync functions are not available in 6.1, see, e.g. https://github.com/ROCm/clr/tree/rocm-6.1.x/hipamd/include/hip/amd_detail . The develop branch has an implementation which may appear in a future release.
The develop implementation mentioned above has restrictions on its use that match the restrictions stated for pascal in the cuda guide.
The C++ Language Extensions documentation for ROCm 6.1.2 / HIP 6.1.40092 describes this as if the __sync functions were already a thing.
Note that the
__syncvariants are made available in ROCm 6.2
Note that this is the only reference to ROCm 6.2 in the entire document, the following sections simply list all the _sync variants without any reference to the future ROCm version. Why are future features documented in earlier releases? It seems like somebody just copy-pasted it from NVIDIA :shrug:
Apologies for the unclear documentation. These functions are available and disabled by default in 6.2 as stated, usable via a preprocessor macro. If there are issues with their functionality, feel free to comment and we can reopen this thread, or you can submit a new issue.