composable_kernel
composable_kernel copied to clipboard
debug build got error: R_X86_64_REX_GOTPCRELX | R_X86_64_PC32 out of range
Problem Description
during Debug build, facing R_X86_64_REX_GOTPCRELX( R_X86_64_PC32) out of range errors as following:
# issue1
[ 83%] Built target test_convnd_bwd_data
ld.lld: error: ../../library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/CMakeFiles/device_grouped_conv3d_fwd_instance.dir/xdl/mem/_ZN2ck16tensor_operation6device47DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3ILi3ENS_13tensor_layout11convolution6NDHWGCENS4_6GKZYXCENS_5TupleIJEEENS4_6NDHWGKEffffS8_fNS0_12element_wise11PassThroughESB_SB_LNS1_32ConvolutionForwardSpecializationE0ELNS1_18GemmSpecializationE7ELi64ELi16ELi16ELi128ELi8ELi8ELi16ELi16ELi1ELi1ENS_8SequenceIJLi16ELi4ELi1EEEENSE_IJLi1ELi0ELi2EEEESG_Li2ELi4ELi4ELi0ESF_SG_SG_Li2ELi4ELi4ELi0ELi1ELi1ENSE_IJLi1ELi16ELi1ELi4EEEELi4ELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE0EffE7Invoker3RunEPKNS1_12BaseArgumentERK12StreamConfig+0x10): relocation R_X86_64_REX_GOTPCRELX out of range: -3824024220 is not in [-2147483648, 2147483647]; references 'typeinfo for ck::tensor_operation::device::BaseArgument'
# issue2
[ 96%] Built target device_gemm_universal_instance
ld.lld: error: ../../library/src/tensor_operation_instance/gpu/gemm_universal/CMakeFiles/device_gemm_universal_instance.dir/device_gemm_xdl_universal_f16_f16_f16/device_gemm_xdl_universal_f16_f16_f16_mk_kn_mn_comp_default_instance.cpp.o:(.rodata._ZN2ck16tensor_operation6device27getGemmSpecializationStringB5cxx11ERKNS1_18GemmSpecializationE+0x0): relocation R_X86_64_PC32 out of range: 3010561409 is not in [-2147483648, 2147483647]; references section '.text._ZN2ck16tensor_operation6device27getGemmSpecializationStringB5cxx11ERKNS1_18GemmSpecializationE'
Operating System
ubuntu 20.04
CPU
Ryzen
GPU
AMD Instinct MI300
Other
No response
ROCm Version
ROCm 6.0.0
ROCm Component
No response
Steps to Reproduce
sudo docker run --rm -it --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined -e HIP_VISIBLE_DEVICES=2 -v /home/zhengjli:/workspace -w /workspace rocm/composable_kernel:ck_ub20.04_rocm6.1_amd-staging
cd composable_kernel/
mkdir build && cd build
cmake \
-D CMAKE_PREFIX_PATH=/opt/rocm \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_CXX_FLAGS="-O1 -mcmodel=large -fPIC" \
-D CMAKE_BUILD_TYPE=Debug \
-D GPU_TARGETS="gfx941" \
..
make -j128 install
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
following this issue https://github.com/ROCm/composable_kernel/issues/508
previously built with release has fixed
thanks a lot David
@ZJLi2013 Internal ticket has been created to investigate this issue. Thanks!
@ZJLi2013 Some suggestion from CK team @illsilin :
- "the debug build for the entire library never really worked. every time the compiler sees the inline assembly in the code it freaks out. So if a user wants to run some specific kernel in debug mode, they should build that specific example or instance with debug flags, but not the entire library."
- "Up until rocm6.2, every time you try building a large library or a binary, the linker could not handle anything over 2Gb. The size could be either due to building for multiple targets, or due to building in debug mode. starting from rocm6.2 there's a new compiler flag "--offload-compress" which helps decrease the size of the device code."
Can you try those suggestions for your builds?
Closed issue as no updates for 3 weeks.