rules_cuda
rules_cuda copied to clipboard
[BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library.
brief
NOTE: in the default platform, which is x86_64(k8) toolchain , the compile and linking works. I wonder if it's a bug or just a misconfiguration during usage of this repo?
environment
bazel: version7.0.2 cctoolchain: //bazel/toolchains/v5l (a custom cc toolchain for cross compile in aarch64, like https://github.com/f0rmiga/gcc-toolchain/blob/main/toolchain/cc_toolchain_config.bzl)
├── toolchains
│ └── v5l
│ ├── BUILD
│ ├── v5l.BUILD
│ └── v5l_cc_toolchain_config.bzl
repro steps
simply compile the basic example with cu_library and report the following errors. NOTE: in the default platform, which is x86_64(k8) toolchain , the compile works.
cc_binary(
name = "module_cuda_main",
srcs = ["tool/module_cuda_main.cpp"],
includes = ["include"],
tags = ["tool"],
visibility = ["//main:__pkg__"],
deps = [
":module_cuda"
]
(03:11:14) INFO: Current date is 2024-08-08
(03:11:14) INFO: Analyzed 323 targets (0 packages loaded, 21 targets configured).
(03:11:14) ERROR: /gw_demo/modules/team_demo/module_demo/BUILD:57:15: Linking modules/team_demo/module_demo/module_cuda_main failed: (Exit 1): aarch64-buildroot-linux-gnu-gcc failed: error executing CppLink command (from target //modules/team_demo/module_demo:module_cuda_main)
(cd /home/zs/.cache/bazel/_bazel_zs/2c098eac6c684e1fabebb74f5f4483bd/execroot/gaos && \
exec env - \
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:/usr/lib/x86_64-linux-gnu:/opt/rti.com/rti_connext_dds-6.0.1/lib/x64Linux4gcc7.3.0:/opt/ros/humble/opt/rviz_ogre_vendor/lib:/opt/ros/humble/lib/x86_64-linux-gnu:/opt/ros/humble/lib \
PATH=/usr/local/cuda/bin:/opt/rti.com/rti_connext_dds-6.0.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/ros/humble/bin \
PWD=/proc/self/cwd \
external/v5l_cc_toolchain/bin/aarch64-buildroot-linux-gnu-gcc -o bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/module_cuda_main -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs' -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/_objs/module_cuda_main/module_cuda_main.pic.o bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/libmodule_cuda.a external/local_cuda/cuda/lib64/libcudadevrt.a -lcudart -l:libcudart.so.11.0 -l:libcudart.so.11.4.409 -lcudart -lcuda -pie -ldl -lpthread -lrt -Wl,-rpath,lib/ -L/drive/drive-linux/lib-target/ -L/drive/drive-linux/filesystem/targetfs/usr/lib/aarch64-linux-gnu/ -Wl,-rpath-link,/drive/drive-linux/filesystem/targetfs/usr/lib/aarch64-linux-gnu/ -Wl,-rpath-link,/drive/drive-linux/lib-target/ -Wl,-rpath-link,/usr/lib/aarch64-linux-gnu -Wl,-rpath-link,/usr/aarch64-linux-gnu -Wl,-rpath-link,/drive/drive-linux/filesystem/targetfs/lib/aarch64-linux-gnu -lgcov -lstdc++ -no-canonical-prefixes)
# Configuration: 93bfd7653555f545157f5fbb9812135069a379b953233ab0eef19c8f88c3340d
# Execution platform: @@local_config_platform//:host
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64/libcudart.so when searching for -lcudart
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: cannot find -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64/libcudart.so.11.4.409 when searching for -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64/libcudart.so.11.4.409 when searching for -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: cannot find -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64/libcudart.so when searching for -lcudart
collect2: error: ld returned 1 exit status
(03:11:14) INFO: Elapsed time: 0.602s, Critical Path: 0.08s
(03:11:14) INFO: 2 processes: 2 internal.
(03:11:14) ERROR: Build did NOT complete successfully
considerations
skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0
which means the cuda libraries is still the lib64 version in usr/local/cuda/lib64.
actually the cuda libraries may be installed in other dirs and may consist multiple arch version. especially in nvidia AGX machines.
- CUDA: Should be installed at
/usr/local, CUDA for various platforms should be in the target directory of/usr/local/cuda-X- e.g. aarch64-linux CUDA 10.1 should be located at
/usr/local/cuda-10.1/targets/aarch64-linux- CUDA-X DL Libs (i.e. TensorRT and cuDNN): Should be located at
/usr/local/cuda-X/dl/targets/<PLATFORM>/{include, lib}- Other system dependencies: Dependencies should be located in
/usr/local/{include, lib}for x86_64,/usr/aarch64-linux-gnu/for aarch64-linux and/usr/aarch64-unknown-nto-qnx/aarch64lefor aarch64-qnx
https://github.com/NVIDIA/DL4AGX/blob/9a4f60c2847d32e81372b9a2165299a3b65eabf1/CONTRIBUTING.md?plain=1#L201-L205
related info
there is an old version of cuda toolchain config which supports multiplatform_cpu compile in bazel, but the CROSSTOOL is outdated and not available in the latest version of bazel.
https://github.com/NVIDIA/DL4AGX/tree/master
EDIT: there already has an issue talking about resolving multiple version of cuda libraries, but I don't think the issue resolved by design https://github.com/bazel-contrib/rules_cuda/issues/113
workaround(works)
add the library path manually should compile the binary successfully.
linkopts = [
"-L/usr/local/cuda/targets/aarch64-linux/lib",
],
I've found that in the doc page
rules_cuda_dependencies(toolkit_path)Populate the dependencies for rules_cuda. This will setup workspace dependencies (other bazel rules) and local toolchains. Name Description Default Value toolkit_path Optionally specify the path to CUDA toolkit. If not specified, it will be detected automatically.
Is there an example to show how to use it correctly?
I think you are configuring it correctly. The root problem is the cross compiling is not addressed in this rule at the moment. So exec_compatible_with for tools and target_compatible_with for runtime are assumed to be the same, but they are not enforced so it is workaroundable.
I think you are configuring it correctly. The root problem is the cross compiling is not addressed in this rule at the moment.
OK, I will keep the issue open.