amrex
amrex copied to clipboard
`__AMDGCN_WAVEFRONT_SIZE` only has expected value during device pass
This is in reference to a previous pull request #3777. The __AMDGCN_WAVEFRONT_SIZE only has the expected (correct) value when referenced in the device pass of compilation, and is left at a default value 64 when referenced in the host pass. Unfortunately, it is apparently nontrivial to expose the device-pass value during the host pass through this macro, and I don't have a suggestion from compiler folks about exposing this value at compile time another way. They aim to add a warning when the host-pass detects this macro is used, but as for compile-time value I'm not sure I have any good idea. Manual selection throughifdefing on the architecture could be inconvenient to maintain.
@mrowan137 Thanks for looking into this! I guess we will set the wavefront size at compile time with our own macro. For now it's 32 for gfx10?? and 64 for all others.
https://github.com/llvm/llvm-project/blob/efc063b621ea0c4d1e452bcade62f7fc7e1cc937/clang/test/Driver/amdgpu-macros.cl#L70-L115
@psychocoderhpc
We might have a way to detect at compile time if the macro we set is correct.
__host__ __device__ void foo ()
{
#ifdef __HIP_DEVICE_COMPILE__
static_assert(__AMDGCN_WAVEFRONT_SIZE == AMREX_AMDGCN_WAVEFRONT_SIZE);
#endif
}
$ hipcc -DAMREX_AMDGCN_WAVEFRONT_SIZE=64 --offload-arch=gfx90a -c foo.cpp
$ hipcc -DAMREX_AMDGCN_WAVEFRONT_SIZE=32 --offload-arch=gfx90a -c foo.cpp
foo.cpp:5:5: error: static assertion failed due to requirement '64 == 32'
static_assert(__AMDGCN_WAVEFRONT_SIZE == AMREX_AMDGCN_WAVEFRONT_SIZE);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx90a.
$ hipcc -DAMREX_AMDGCN_WAVEFRONT_SIZE=64 --offload-arch=gfx1010 -c foo.cpp
foo.cpp:5:5: error: static assertion failed due to requirement '32 == 64'
static_assert(__AMDGCN_WAVEFRONT_SIZE == AMREX_AMDGCN_WAVEFRONT_SIZE);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx1010.
$ hipcc -DAMREX_AMDGCN_WAVEFRONT_SIZE=32 --offload-arch=gfx1010 -c foo.cpp
@WeiqunZhang Thanks for pinging me in this issue:
In our projects, we use this macro if caste for the detection
https://github.com/alpaka-group/mallocMC/blob/bffe2aa2da5e83d356ff8d32f392935b8f7a59fa/src/include/mallocMC/mallocMC_utils.hpp#L74C7-L87
where MALLOCMC_DEVICE_COMPILE is the detection if we are in the host or device compile pass. The only clean device pass detection I found was
https://github.com/alpaka-group/mallocMC/blob/bffe2aa2da5e83d356ff8d32f392935b8f7a59fa/src/include/mallocMC/mallocMC_utils.hpp#L53
__HIP_DEVICE_COMPILE__ is not enough to address the device pass only, therefore, I always check additionally for __HIP_ARCH_HAS_WARP_BALLOT__, which is available in all modern AMD GPUs.
The following example works fine if you do not use static_assert because due to a strange parsing stage in the compiler, the device functions are parsed but without setting __HIP_DEVICE_COMPILE__.
#include <hip/hip_runtime.h>
#include <iostream>
#if(defined(__HIP_ARCH_HAS_WARP_BALLOT__) || defined(__CUDA_ARCH__) || __HIP_DEVICE_COMPILE__ == 1)
# define MALLOCMC_DEVICE_COMPILE 1
#endif
#if defined(__CUDA_ARCH__)
static constexpr auto myWarpSize = 32; // TODO
#elif(MALLOCMC_DEVICE_COMPILE)
// defined:
// https://github.com/llvm/llvm-project/blob/62ec4ac90738a5f2d209ed28c822223e58aaaeb7/clang/lib/Basic/Targets/AMDGPU.cpp#L400
// overview wave front size:
// https://github.com/llvm/llvm-project/blob/efc063b621ea0c4d1e452bcade62f7fc7e1cc937/clang/test/Driver/amdgpu-macros.cl#L70-L115
// gfx10XX has 32 threads per wavefront else 64
# if(HIP_VERSION_MAJOR >= 4)
static constexpr auto myWarpSize = __AMDGCN_WAVEFRONT_SIZE;
# else
static constexpr auto myWarpSize = 64;
# endif
#else
static constexpr auto myWarpSize = 1;
#endif
__host__ __device__ void foo()
{
//static_assert(myWarpSize == EXTERN_DEFINED_WAVE_SIZE);
constexpr int detectedWarpSize = myWarpSize;
printf("device warp size: %i\n",(int)(detectedWarpSize));
}
__global__ void fooKernel()
{
foo();
}
int main()
{
fooKernel<<<1,1>>>();
hipDeviceSynchronize();
std::cout<<"host warp size: "<<__AMDGCN_WAVEFRONT_SIZE<<std::endl;
return 0;
}
hipcc main.cpp -std=c++17 -DEXTERN_DEFINED_WAVE_SIZE=32
result on a AMD GPU with wave size 32.
./a.out
device warp size: 32
host warp size: 64
Sorry I have not carefully enough read the initial comment. Getting the warp size at compile time looks currently not possible.
@WeiqunZhang @ax3l Issuance of Clang warning is implemented here: https://github.com/llvm/llvm-project/pull/91478