llvm-project Link to LLVM failed

I'm trying to build comgr from source, with the following cmake config:

cmake -S . -B build \
-DCMAKE_BUILD_TYPE=Release \
-DAMDDeviceLibs_DIR=$HOME/opt/rocm/5.4.3/lib/cmake/AMDDeviceLibs \
-DLLD_DIR=$HOME/opt/llvm/15.0.7/lib/cmake/lld \
-DClang_DIR=$HOME/opt/llvm/15.0.7/lib/cmake/clang \
-DROCM_DIR=$HOME/opt/rocm/5.4.3/share/rocm/cmake \
-DCMAKE_INSTALL_PREFIX=$PWD/install

cmake --build build

There are linking errors reporting undefined references to LLVM library during building:

/usr/bin/ld: CMakeFiles/amd_comgr.dir/src/comgr-metadata.cpp.o: in function `COMGR::metadata::getMetadataRoot(COMGR::DataObject*, C
OMGR::DataMeta*)':
comgr-metadata.cpp:(.text+0x2b5): undefined reference to `llvm::object::object_category()'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x2cf): undefined reference to `llvm::StringError::StringError(llvm::Twine const&, std::erro
r_code)'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x2f9): undefined reference to `llvm::object::object_category()'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x313): undefined reference to `llvm::StringError::StringError(llvm::Twine const&, std::erro
r_code)'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x4ad): undefined reference to `llvm::object::object_category()'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x4c2): undefined reference to `llvm::StringError::StringError(llvm::Twine const&, std::erro
r_code)'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x4ec): undefined reference to `llvm::object::object_category()'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x501): undefined reference to `llvm::StringError::StringError(llvm::Twine const&, std::erro
r_code)'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x715): undefined reference to `llvm::object::object_category()'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x72f): undefined reference to `llvm::StringError::StringError(llvm::Twine const&, std::erro
r_code)'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x759): undefined reference to `llvm::object::object_category()'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x773): undefined reference to `llvm::StringError::StringError(llvm::Twine const&, std::erro
r_code)'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x8fd): undefined reference to `llvm::object::object_category()'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x917): undefined reference to `llvm::StringError::StringError(llvm::Twine const&, std::erro
r_code)'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x941): undefined reference to `llvm::object::object_category()'
/usr/bin/ld: comgr-metadata.cpp:(.text+0x95b): undefined reference to `llvm::StringError::StringError(llvm::Twine const&, std::erro
r_code)'
/usr/bin/ld: CMakeFiles/amd_comgr.dir/src/comgr-metadata.cpp.o: in function `COMGR::metadata::getELFObjectFileBase(COMGR::DataObjec
t*)':

I'm using the latest version AMD LLVM and device libs from the amd-stg-open branch. The full build log is available build.log. Is there any approach to tackle this? Thanks in advance.

Mar 04 '23 02:03 xuantengh

The AMD LLVM project is config as:

cmake -S llvm -B build -G "Ninja" \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=gcc \
-DCMAKE_CXX_COMPILER=g++ \
-DBUILD_SHARED_LIBS \
-DLLVM_ENABLE_LIBCXX=ON \
-DLLVM_ENABLE_PROJECTS="llvm;clang;lld" \
-DLLVM_ENABLE_RUNTIMES="compiler-rt;libc;libcxx;libcxxabi" \
-DCMAKE_INSTALL_PREFIX=$HOME/opt/rocm/5.4.3 \
-DLLVM_TARGETS_TO_BUILD="X86;AMDGPU"
cmake --build build
cmake --build build --target install

This issue could be fixed by disabling the shared library support of the AMD LLVM project (i.e., removing the -DBUILD_SHARED_LIBS option). I'm not suring why enabling this option will fail comgr building.

Mar 04 '23 07:03 xuantengh

Currently building Comgr against an LLVM with -DBUILD_SHARED_LIBS=ON is not supported. There has been an ongoing effort to switch to shared/dynamic libraries, and it is almost but not quite ready.

I will update here once shared library support is ready, but for now you should try building with static LLVM libraries.

Mar 27 '23 21:03 lamb-j

Hi. I am facing a similar issue. I get undefined symbol errors like:

ld.lld: error: undefined symbol: llvm::cl::ResetAllOptionOccurrences()
>>> referenced by comgr.cpp:366 (/builddir/build/BUILD/ROCm-CompilerSupport-rocm-5.7.1/lib/comgr/src/comgr.cpp:366)
>>>               CMakeFiles/amd_comgr.dir/src/comgr.cpp.o:(COMGR::clearLLVMOptions())

Indeed by setting -DBUILD_SHARED_LIBS=OFF, the build does pass.

The strange thing is the Arch package does NOT use this: https://gitlab.archlinux.org/archlinux/packaging/packages/comgr/-/blob/a673a34303cc87e813784a7a905da539ee92ac06/PKGBUILD

What is the impact of using -DBUILD_SHARED_LIBS=OFF for the rest of the stack build?

Oct 30 '23 20:10 squid-f

What is the impact of using -DBUILD_SHARED_LIBS=OFF for the rest of the stack build?

I have not yet faced any issue related to this.

Oct 31 '23 02:10 xuantengh

Thanks @Huangxt57 for your prompt reply. I have built the stack up to https://github.com/ROCm-Developer-Tools/clr to try to bring OpenCL to my RX6600. and it does not work (yet). The thing is to build clr, -DBUILD_SHARED_LIBS=OFF is also required. Actually, I start getting a detection issue with clinfo provided by ROCm. As there is no .so created but a .a instead, should I point to this .a in the icd file?

Oct 31 '23 19:10 squid-f

Hi. The build of rocm-clr with -DBUILD_SHARED_LIBS=OFF fails to provide OpenCL support; reported in https://github.com/ROCm-Developer-Tools/clr/issues/26

I decided then to patch rocm-clr to be able to get libamdocl64.so : rocm-compilersupport-5.7.1-allow-lld-undefined.patch.txt

What could be the drawback by doing so?

Nov 19 '23 17:11 squid-f

@squid-f What's your final objective here? Are you just trying to get a build of Comgr and CLR working together?

You should be able to build both CLR and Comgr with their default "BUILD_SHARED_LIBS" settings. I don't normally set either when building the two separate projects.

For CLR, I use something like the following to build:

cmake -DCLR_BUILD_HIP=ON -DCLR_BUILD_OCL=ON -DHIP_COMMON_DIR=/path/to/hip -DCMAKE_INSTALL_PREFIX=$PWD/install ..
make -j32
make install

For Comgr:

export LLVM_PROJECT=/path/to/llvm-project/build (AMD LLVM branch)
export DEVICE_LIBS=/path/to/llvm-project/amd/device-libs/build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="$LLVM_PROJECT;$DEVICE_LIBS" ..

Nov 20 '23 16:11 lamb-j

Hi @lamb-j My final goal is to bring OpenCL suppor to AMD GPU NAVI architectures to Mageia Linux. For that, I have followed the Arch packaging strategy and built the following packages: Rocm-llvm, Rocm-cmake, Rocm-core, Hsakmt, Rocm-device-libs, Rocm-compilersupport (comgr), Rocm-runtime (hsa-rocr), Rocminfo, Rocm-clr.

As @Huangxt57 , Comgr doesn't build with -DBUILD_SHARED_LIBS=ON. But, if I use -DBUILD_SHARED_LIBS=OFF, I cannot build CLR later on... So, I patched Comgr to buid it with -DBUILD_SHARED_LIBS=ON (or, as you, it is equivalent to build without setting it). However, my GPU is not yet detected as a device within the platform (as explained in https://github.com/ROCm-Developer-Tools/clr/issues/26 ).

Any clue why?

Thanks

Nov 20 '23 21:11 squid-f

I'd probably suggest to follow Fedora if you need to stick to shared libs. Fedora uses the upstream LLVM release branches, but RadeonOpenCompute/llvm-project follows llvm-project upstream's trunk ('main" branch), so the API is unstable, defeating the benefit of shared libs. If you want a more stable LLVM experience, I highly recommend following Fedora in this regard.

If you can build with shared libs OFF, then you should be able to use RadeonOpenCompute/llvm-project without issue as long as they are from the same release branch, e.g. "5.7.x".

As for building from amd-stg-open, you'll have more of a mixed bag on what will and will not compile against clr. In theory, you should be able to build clr's develop branch against amd-stg-open, but your millage may vary.

Nov 20 '23 22:11 Mystro256

-DLLD_DIR=$HOME/opt/llvm/15.0.7/lib/cmake/lld
-DClang_DIR=$HOME/opt/llvm/15.0.7/lib/cmake/clang \

I noticed you're using llvm 15, any reason the distro doesn't have 16 or 17 available?

Either way, I have some forks available if you need to use llvm upstream rather than RadeonOpenCompute/llvm-project: https://github.com/Mystro256/ROCm-Device-Libs/branches https://github.com/Mystro256/ROCm-CompilerSupport/branches

Note the llvm 17 branch should be the same as the official RadeonOpenCompute And these are the branched used by Fedora since LLVM 14.

Nov 20 '23 22:11 Mystro256

Thanks @Mystro256 for stepping in. The first post you are referring to, is not from me. I do use LLVM17 from a rocm-llvm package, I have built specifically to build the rest of stack. I have followed here the Arch approach. The weird thing is it looks like Arch is able to build with shared libs and I can’t. Despite I use the exact same build flags… hence, the patch I had to develop.

Nov 21 '23 07:11 squid-f

I think the shared library issue is orthogonal to your issue. It's fine to build Comgr as a shared library. That's the default currently, and you shouldn't need any patch or BUILD_SHARED_LIBS options to have that working. It's also fine to build CLR as a shared library. Again the default, no patches or CMake variables needed.

The one thing you can't currently do is build LLVM as a shared library and use that when building Comgr. But you'll only hit that issue if you manually set BUILD_SHARED_LIBS=ON in your LLVM build (this is what @Huangxt57 hit in this original issue), because the LLVM default is static libraries.

Nov 21 '23 15:11 lamb-j

The one thing you can't currently do is build LLVM as a shared library and use that when building Comgr. But you'll only hit that issue if you manually set BUILD_SHARED_LIBS=ON in your LLVM build (this is what @Huangxt57 hit in this original issue), because the LLVM default is static libraries.

Hi.

Here are the flags are used to build rocm-llvm 5.7.1:

-G Ninja \
        -B build \
        -S "./llvm" \
        -DCMAKE_BUILD_TYPE=RelWithDebInfo \
        -DCMAKE_INSTALL_PREFIX=%{install_prefix} \
        -DLLVM_HOST_TRIPLE=%{_host} \
        -DLLVM_DEFAULT_TARGET_TRIPLE=%{_host} \
        -DLLVM_ENABLE_PROJECTS='llvm;clang;compiler-rt;lld' \
        -DLLVM_TARGETS_TO_BUILD='AMDGPU;NVPTX;X86' \
        -DCLANG_DEFAULT_LINKER=lld \
        -DLLVM_INSTALL_UTILS=ON \
        -DLLVM_ENABLE_BINDINGS=OFF \
        -DLLVM_LINK_LLVM_DYLIB=OFF \
        -DLLVM_BUILD_LLVM_DYLIB=OFF \
        -DLLVM_LINK_LLVM_DYLIB=OFF \
        -DLLVM_ENABLE_ASSERTIONS=ON \
        -DOCAMLFIND=NO \
        -DLLVM_ENABLE_OCAMLDOC=OFF \
        -DLLVM_INCLUDE_BENCHMARKS=OFF \
        -DLLVM_BUILD_TESTS=ON \
        -DLLVM_INCLUDE_TESTS=ON \
        -DCLANG_INCLUDE_TESTS=ON \
        -DLLVM_BINUTILS_INCDIR=%{_includedir}

However, the log output shows:

/usr/bin/cmake -Wno-dev -S . -B build -DCMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=-DNDEBUG -DCMAKE_C_FLAGS_RELWITHDEBINFO:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_LIBDIR:PATH=lib64 -DCMAKE_INSTALL_LIBEXECDIR:PATH=libexec -DCMAKE_INSTALL_PREFIX:PATH=/usr -DCMAKE_INSTALL_RUNSTATEDIR:PATH=/run -DCMAKE_INSTALL_SYSCONFDIR:PATH=/etc -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLIB_SUFFIX=64 -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON '-DCMAKE_MODULE_LINKER_FLAGS=-Wl,--as-needed  -Wl,-z,relro -Wl,-O1 -Wl,--build-id=sha1 -Wl,--enable-new-dtags' -DBUILD_SHARED_LIBS:BOOL=ON -DBUILD_STATIC_LIBS:BOOL=OFF -G Ninja -B build -S ./llvm -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=/usr/lib64/rocm/llvm -DLLVM_HOST_TRIPLE=x86_64-mageia-linux-gnu -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-mageia-linux-gnu '-DLLVM_ENABLE_PROJECTS=llvm;clang;compiler-rt;lld' '-DLLVM_TARGETS_TO_BUILD=AMDGPU;NVPTX;X86' -DCLANG_DEFAULT_LINKER=lld -DLLVM_INSTALL_UTILS=ON -DLLVM_ENABLE_BINDINGS=OFF -DLLVM_LINK_LLVM_DYLIB=OFF -DLLVM_BUILD_LLVM_DYLIB=OFF -DLLVM_LINK_LLVM_DYLIB=OFF -DLLVM_ENABLE_ASSERTIONS=ON -DOCAMLFIND=NO -DLLVM_ENABLE_OCAMLDOC=OFF -DLLVM_INCLUDE_BENCHMARKS=OFF -DLLVM_BUILD_TESTS=ON -DLLVM_INCLUDE_TESTS=ON -DCLANG_INCLUDE_TESTS=ON -DLLVM_BINUTILS_INCDIR=/usr/include

So, it looks like somethings adds -DBUILD_SHARED_LIBS:BOOL=ON -DBUILD_STATIC_LIBS:BOOL=OFF I will investigate whether a macro from my Build System does that.

Nov 22 '23 07:11 squid-f

So, conclusion, indeed, my BS was changing the default setting of rocm-llvm, which was not built with BUILD_SHARED_LIBS=OFF I fixed that to get rocm-llvm built with static libs. I was then able to build ROCm-CompilerSupport with BUILD_SHARED_LIBS=ON without any patch. I can now also build rocm-clr with BUILD_SHARED_LIBS=ON

The issue is solved for me. Thanks @lamb-j @Mystro256

I might open another report as my GPU gfx1032 (AMD Radeon RX 6600) is not found by OpenCL, still...

Nov 23 '23 21:11 squid-f

Hi. I need to correct myself: thanks to rocm-llvm built with BUILD_SHARED_LIBS=OFF, I was able to build rocm-clr with BUILD_SHARED_LIBS=ON, AND, now, my GPU gfx1032 (AMD Radeon RX 6600) is found by OpenCL ! So, there is now a complete ROCm stack available for Mageia Linux.

Thanks again !

Nov 24 '23 20:11 squid-f

Nice work with the investigation @squid-f, and thanks for your efforts to port ROCm to Mageia.

Still leaving this ticket open, as @Huangxt57's original issue still exists (can't build Comgr against a shared-lib LLVM)

Nov 27 '23 18:11 lamb-j