ucc icon indicating copy to clipboard operation
ucc copied to clipboard

Building with ROCm/HIP fails on a system without GPU

Open lahwaacz opened this issue 3 months ago • 5 comments

The cuda_lt.sh script contains a --offload-arch=native flag for amdclang:

https://github.com/openucx/ucc/blob/c1734db1b2bc9ffeba5d17b3e81e1a9425dee100/cuda_lt.sh#L31

This should select the native architecture of the GPU present in the build system. However, if the build system does not have any GPU, the command fails:

$ /opt/rocm/lib/llvm/bin/amdclang -c -x hip -target x86_64-unknown-linux-gnu --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=native ec_rocm_executor_kernel.cu -I/usr/include/ -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include/hip -I/opt/rocm/include -I/opt/rocm/llvm/include -I/opt/rocm/include/hsa -I/opt/rocm/include -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src/components/ec/rocm -fPIC -O3 -o ./.libs/ec_rocm_executor_kernel.o
/opt/rocm/lib/llvm/bin/amdclang -c -x hip -target x86_64-unknown-linux-gnu --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=native ec_rocm_reduce.cu -I/usr/include/ -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include/hip -I/opt/rocm/include -I/opt/rocm/llvm/include -I/opt/rocm/include/hsa -I/opt/rocm/include -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src/components/ec/rocm -fPIC -O3 -o ./.libs/ec_rocm_reduce.o
clang: error: cannot determine amdgcn architecture: /opt/rocm/lib/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'
clang: error: cannot determine amdgcn architecture: /opt/rocm/lib/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'

lahwaacz avatar Apr 28 '24 16:04 lahwaacz