composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

debug build got error: R_X86_64_REX_GOTPCRELX | R_X86_64_PC32 out of range

Open ZJLi2013 opened this issue 1 year ago • 1 comments

Problem Description

during Debug build, facing R_X86_64_REX_GOTPCRELX( R_X86_64_PC32) out of range errors as following:

# issue1 
[ 83%] Built target test_convnd_bwd_data
ld.lld: error: ../../library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/CMakeFiles/device_grouped_conv3d_fwd_instance.dir/xdl/mem/_ZN2ck16tensor_operation6device47DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3ILi3ENS_13tensor_layout11convolution6NDHWGCENS4_6GKZYXCENS_5TupleIJEEENS4_6NDHWGKEffffS8_fNS0_12element_wise11PassThroughESB_SB_LNS1_32ConvolutionForwardSpecializationE0ELNS1_18GemmSpecializationE7ELi64ELi16ELi16ELi128ELi8ELi8ELi16ELi16ELi1ELi1ENS_8SequenceIJLi16ELi4ELi1EEEENSE_IJLi1ELi0ELi2EEEESG_Li2ELi4ELi4ELi0ESF_SG_SG_Li2ELi4ELi4ELi0ELi1ELi1ENSE_IJLi1ELi16ELi1ELi4EEEELi4ELNS_26BlockGemmPipelineSchedulerE0ELNS_24BlockGemmPipelineVersionE0EffE7Invoker3RunEPKNS1_12BaseArgumentERK12StreamConfig+0x10): relocation R_X86_64_REX_GOTPCRELX out of range: -3824024220 is not in [-2147483648, 2147483647]; references 'typeinfo for ck::tensor_operation::device::BaseArgument'

# issue2 
[ 96%] Built target device_gemm_universal_instance
ld.lld: error: ../../library/src/tensor_operation_instance/gpu/gemm_universal/CMakeFiles/device_gemm_universal_instance.dir/device_gemm_xdl_universal_f16_f16_f16/device_gemm_xdl_universal_f16_f16_f16_mk_kn_mn_comp_default_instance.cpp.o:(.rodata._ZN2ck16tensor_operation6device27getGemmSpecializationStringB5cxx11ERKNS1_18GemmSpecializationE+0x0): relocation R_X86_64_PC32 out of range: 3010561409 is not in [-2147483648, 2147483647]; references section '.text._ZN2ck16tensor_operation6device27getGemmSpecializationStringB5cxx11ERKNS1_18GemmSpecializationE'

Operating System

ubuntu 20.04

CPU

Ryzen

GPU

AMD Instinct MI300

Other

No response

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

sudo docker run --rm -it  --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined -e HIP_VISIBLE_DEVICES=2 -v /home/zhengjli:/workspace -w /workspace  rocm/composable_kernel:ck_ub20.04_rocm6.1_amd-staging 
cd composable_kernel/
mkdir build && cd build 
cmake                                            \
-D CMAKE_PREFIX_PATH=/opt/rocm                    \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc          \
-D CMAKE_CXX_FLAGS="-O1 -mcmodel=large -fPIC"       \
-D CMAKE_BUILD_TYPE=Debug                         \
-D GPU_TARGETS="gfx941"                      \
.. 

make -j128 install 

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

following this issue https://github.com/ROCm/composable_kernel/issues/508

previously built with release has fixed

thanks a lot David

ZJLi2013 avatar Aug 01 '24 10:08 ZJLi2013

@ZJLi2013 Internal ticket has been created to investigate this issue. Thanks!

ppanchad-amd avatar Sep 25 '24 15:09 ppanchad-amd

@ZJLi2013 Some suggestion from CK team @illsilin :

  1. "the debug build for the entire library never really worked. every time the compiler sees the inline assembly in the code it freaks out. So if a user wants to run some specific kernel in debug mode, they should build that specific example or instance with debug flags, but not the entire library."
  2. "Up until rocm6.2, every time you try building a large library or a binary, the linker could not handle anything over 2Gb. The size could be either due to building for multiple targets, or due to building in debug mode. starting from rocm6.2 there's a new compiler flag "--offload-compress" which helps decrease the size of the device code."

Can you try those suggestions for your builds?

huanrwan-amd avatar Sep 30 '24 15:09 huanrwan-amd

Closed issue as no updates for 3 weeks.

huanrwan-amd avatar Oct 18 '24 18:10 huanrwan-amd