build failure unable to find library -lhsakmt
ld.lld: error: unable to find library -lhsakmt make[2]: Leaving directory '/home/user/rocm_sdk_builder/builddir/016_03_llvm_project_openmp' make[2]: Leaving directory '/home/user/rocm_sdk_builder/builddir/016_03_llvm_project_openmp' [ 51%] Built target Utils.cpp-gfx906.bc clang++: error: linker command failed with exit code 1 (use -v to see invocation)
This is after ./babs.sh -up ./babs.sh --clean ./babs.sh -b
git rev 84faa05 I was attempting to test on my MI60 but haven't been able to get a clean build on ArchLinux.
Have you been earlier able to build the "016_03_llvm_project_openmp" project. I know that some people have used arch linux earlier. Do you have multiple versions of it if you do:
cd /opt
find -name libhsakmt.so
I have
./rocm_sdk_612/lib64/libhsakmt.so
./rocm_sdk_612/lib/libhsakmt.so
All libhsak* versions in lib-directory are symlinks to lib64.
ls -la /opt/rocm_sdk_612/lib/libhsakmt.*
lrwxrwxrwx 1 lamikr lamikr 35 Nov 12 00:13 /opt/rocm_sdk_612/lib/libhsakmt.a -> /opt/rocm_sdk_612/lib64/libhsakmt.a
lrwxrwxrwx 1 lamikr lamikr 36 Nov 12 00:12 /opt/rocm_sdk_612/lib/libhsakmt.so -> /opt/rocm_sdk_612/lib/libhsakmt.so.1*
lrwxrwxrwx 1 lamikr lamikr 40 Nov 12 00:12 /opt/rocm_sdk_612/lib/libhsakmt.so.1 -> /opt/rocm_sdk_612/lib/libhsakmt.so.1.0.6*
lrwxrwxrwx 1 lamikr lamikr 42 Nov 12 00:12 /opt/rocm_sdk_612/lib/libhsakmt.so.1.0.6 -> /opt/rocm_sdk_612/lib64/libhsakmt.so.1.0.6*
And let's check that all ldd dependencies are found. What does this show:
ldd /opt/rocm_sdk_612/lib64/libhsakmt.so.1.0.6
After the recent posts in the other ticket its building and appears to be progressing further... I will update here with the results once it completes or not.
Thanks for letting know, it would be nice to know what caused that break. So you have Vega VII to test the gfx906?
I have 2x MI60 (or 32GB MI50 whichever it really is) The build stopped awhile ago, and I reran ./babs.sh -b and it failed here
adding 'torchvision-0.20.0a0+324eea9.dist-info/LICENSE' adding 'torchvision-0.20.0a0+324eea9.dist-info/METADATA' adding 'torchvision-0.20.0a0+324eea9.dist-info/WHEEL' adding 'torchvision-0.20.0a0+324eea9.dist-info/top_level.txt' adding 'torchvision-0.20.0a0+324eea9.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel corrupted size vs. prev_size in fastbins ./build_rocm.sh: line 18: 3102616 Aborted (core dumped) ROCM_PATH=${install_dir_prefix_rocm} FORCE_CUDA=1 TORCHVISION_USE_NVJPEG=0 TORCHVISION_USE_VIDEO_CODEC=0 CC=${CMAKE_C_COMPILER} CXX=${CMAKE_CXX_COMPILER} python setup.py bdist_wheel build failed: pytorch_vision error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612
Hmm... Not really sure what is going on. In theory the benchmark should now be able to run some pytorch tests as it has now passed that and is trying now to build pytorch vision.
So are you able to test with
source /opt/rocm_sdk_612/bin/env_rocm.sh
cd /opt/rocm_sdk_612/benchmarks
./run_and_save_benchmarks.sh
If you are in master branch, can you do one more time these commands to verify everything is up to date and then restart pytorch vision build from clean.
./babs.sh -up
./babs.sh -ca
./babs.sh --clean binfo/core/039_03_pytorch_vision.binfo
./babs.sh -b
I have started my self clean build on fedora 40 with gfx906 as an only target. But I need to wait until morning to see the results.
[cb88@M31-AR0 ~]$ cat /opt/rocm_sdk_612/benchmarks/bench.txt Timestamp for benchmark results: 20241218_133446 Saving to file: 20241218_133446_cpu_vs_gpu_simple.txt Benchmarking CPU and GPUs Pytorch version: 2.4.1 ROCM HIP version: 6.1.40093-de7055040 Device: AMD EPYC 7352 24-Core Processor 'CPU time: 35.332 sec Device: AMD Radeon Graphics 'GPU time: 0.604 sec Benchmark ready
Saving to file: 20241218_133446_pytorch_dot_products.txt Pytorch version: 2.4.1 dot product calculation test tensor([[[ 0.8124, 0.2179, -0.4919, -0.4980, -0.6716, 1.2153, -0.0119, -0.9560], [-0.7172, 0.4881, 0.9783, -0.3172, -0.0765, 1.5946, -0.1057, 0.1876], [ 0.8850, 0.3325, -0.6169, -0.5590, -0.7152, 1.3886, -0.0615, -1.1245]],
[[ 0.2982, -0.1511, 0.2687, -0.8882, 0.1656, 0.1409, -1.0829,
0.6578],
[-0.2719, 0.9328, -0.8428, -0.5765, -0.2355, 0.1816, -0.3346,
-0.5164],
[ 0.8432, 0.4674, -0.1435, 0.2439, -0.3148, 1.1532, -0.3879,
-0.1294]]], device='cuda:0')
Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends Device: AMD Radeon Graphics / cuda:0 Default benchmark: 3205.060 microseconds, 0.0032050598703790454 sec SDPBackend.MATH benchmark: 3212.746 microseconds, 0.0032127462700009346 sec SDPBackend.FLASH_ATTENTION benchmark: SDPBackend.FLASH_ATTENTION cuda:0 is not supported. See warnings for reasons. SDPBackend.EFFICIENT_ATTENTION benchmark: SDPBackend.EFFICIENT_ATTENTION cuda:0 is not supported. See warnings for reasons. Device: AMD EPYC 7352 24-Core Processor / cpu Default benchmark: 3844997.412 microseconds, 3.844997411943041 sec SDPBackend.MATH benchmark: 3642490.409 microseconds, 3.6424904089653865 sec SDPBackend.FLASH_ATTENTION benchmark: 3828689.283 microseconds, 3.8286892829928547 sec SDPBackend.EFFICIENT_ATTENTION benchmark: SDPBackend.EFFICIENT_ATTENTION cpu is not supported. See warnings for reasons. Summary
Pytorch version: 2.4.1 ROCM HIP version: 6.1.40093-de7055040 CPU: AMD EPYC 7352 24-Core Processor Problem parameters: Sequence-length: 512 Batch-size: 32 Heads: 16 Embed_dimension: 16 Datatype: torch.float16 Device: AMD Radeon Graphics / cuda:0 Default: 3205.060 ms SDPBackend.MATH: 3212.746 ms SDPBackend.FLASH_ATTENTION: -1.000 ms SDPBackend.EFFICIENT_ATTENTION: -1.000 ms
Device: AMD EPYC 7352 24-Core Processor / cpu Default: 3844997.412 ms SDPBackend.MATH: 3642490.409 ms SDPBackend.FLASH_ATTENTION: 3828689.283 ms SDPBackend.EFFICIENT_ATTENTION: -1.000 ms
[cb88@M31-AR0 opt]$ find -name libhsakmt.so ./rocm_sdk_612/lib64/libhsakmt.so ./rocm_sdk_612/lib/libhsakmt.so ./rocm/lib/libhsakmt.so
/opt/rocm is binary install from Arch.
[cb88@M31-AR0 opt]$ ls -la /opt/rocm_sdk_612/lib/libhsakmt.* lrwxrwxrwx 1 cb88 cb88 35 Dec 11 12:28 /opt/rocm_sdk_612/lib/libhsakmt.a -> /opt/rocm_sdk_612/lib64/libhsakmt.a lrwxrwxrwx 1 cb88 cb88 36 Dec 11 12:28 /opt/rocm_sdk_612/lib/libhsakmt.so -> /opt/rocm_sdk_612/lib/libhsakmt.so.1 lrwxrwxrwx 1 cb88 cb88 40 Dec 11 12:28 /opt/rocm_sdk_612/lib/libhsakmt.so.1 -> /opt/rocm_sdk_612/lib/libhsakmt.so.1.0.6 lrwxrwxrwx 1 cb88 cb88 42 Dec 11 12:28 /opt/rocm_sdk_612/lib/libhsakmt.so.1.0.6 -> /opt/rocm_sdk_612/lib64/libhsakmt.so.1.0.6
So it seems that the original problem with the missing symbol in rocBLAS is solved also for you now and pytorch is able to use the rocBLAS when using MATH backend. LLama.cpp that was also earlier failing for Said-akbar should probably also now work ok if you try to build it with
./babs.sh -ca binfo/extra/ai_tools.blist
./babs.sh -b binfo/extra/ai_tools.blist
and then run
cd /opt/rocm_sdk_612/docs/examples/llm/llama_cpp/
./run_llama_benchmark.sh
There is still this second problem with the flash-attention that needs to be solved. And at the moment I do not have any idea why the pytorch vision build fails for you.
./babs.sh -b binfo/extra/ai_tools.blist ran for a bit then...
-- Found Python: /opt/rocm_sdk_612/bin/python (found version "3.11.9") found components: Interpreter Development.Module Development.SABIModule -- Found python matching: /opt/rocm_sdk_612/bin/python. CMake Error at cmake/utils.cmake:37 (message): Failed to locate torch path: corrupted size vs. prev_size in fastbins
Call Stack (most recent call first): cmake/utils.cmake:45 (run_python) CMakeLists.txt:70 (append_cmake_prefix_path)
-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 483, in
Hmm, vllm build that is before the llama.cpp build seems to fail for similar type error "corrupted size vs. prev_size in fastbins" than pytorch vision.
How about if you just build the llama.cpp
./babs.sh -b binfo/extra/llama_cpp.binfo
Lllama builds and runs sucessfully
Something is still not right with it though... llama_kv_cache_init: ROCm0 KV buffer size = 4000.00 MiB ggml_cuda_host_malloc: failed to allocate 156000.00 MiB of pinned memory: out of memory ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 163577856032 llama_kv_cache_init: failed to allocate buffer for kv cache llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache llama_init_from_gpt_params: failed to create context with model (small model here koboldcpp can load fully in VRAM) main: error: unable to load model IT didnt matter if I passed -ngl 1 or 999 it still tried to allocate a huge buffer and failed.
I noticed from your buildlog that vllm has been configured with RelWithDebInfo. I am wondering have you build whole rocm sdk stack on that mode instead of Release? (can be set in envsetup_user.sh)
If all binaries grow very big, that could perhaps explain the "corrupted size vs. prev_size in fastbins" problem you are seeing? I usually build myself everything first in release mode, and then only rebuild some individual libraries that I need to debug with gdb by enabling the debug build option and then just rebuild those.
Google found some similar types of errors related to process count/running out of memory situations.
Here is anyway my vllm log from same part that fails on your build:
-- Target device: rocm
-- Found Python: /opt/rocm_sdk_612/bin/python (found version "3.11.9") found components: Interpreter Development.Module Development.SABIModule
-- Found python matching: /opt/rocm_sdk_612/bin/python.
Building PyTorch for GPU arch: gfx1035
-- Found HIP: /opt/rocm_sdk_612 (found suitable version "6.1.40093-9e5ee4609", minimum required is "1.0")
HIP VERSION: 6.1.40093-9e5ee4609
-- Caffe2: Header version is: 6.1.2
***** ROCm version from rocm_version.h ****
ROCM_VERSION_DEV: 6.1.2
ROCM_VERSION_DEV_MAJOR: 6
ROCM_VERSION_DEV_MINOR: 1
ROCM_VERSION_DEV_PATCH: 2
ROCM_VERSION_DEV_INT: 60102
HIP_VERSION_MAJOR: 6
HIP_VERSION_MINOR: 1
TORCH_HIP_VERSION: 601
If you have time would you be able to try to do a totally new build with latest code to check if same errors happen again? And do you have envsetup_user.sh?
These commands should clean up everything and then restart build. (I added the -ca command just incase to re-verify that all patches are applied to projects)
cd rocm_sdk_builder
sudo rm -rf /opt/rocm_sdk_612
rm -rf builddir
git checkout master
./babs.sh -up
./babs.sh -ca
./babs.sh -b
I did not have a envsetup_user.sh so it was completely default.
Ok, then that does not explain the problem. Maybe I need to install arch linux by myself also and try it out. I think I will do it first to virtual machine.
Can you check whether you have for example in builddir/004_01_roct-thunk-interface_shared/CMakeCache.txt for CMAKE_BUILD_TYPE?
I have there: CMAKE_BUILD_TYPE:STRING=Release
Crashed here yesterday when I tried to build I updated again today, and tried again same thing, ... I suspect the new version of cmake on arch?
[ 9%] Generating source/cube.bc cd /home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport/test && /opt/rocm_sdk_612/bin/clang-17 -c --offload-arch=gfx900 -emit-llvm -fgpu-rdc --gpu-bundle-output /home/cb88/rocm_sdk_builder/src_projects/llvm-project/amd/comgr/test/source/cube.hip -o source/cube.bc clang-17: warning: argument unused during compilation: '-nogpulib' [-Wunused-command-line-argument] make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Built target reloc-asm make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Linking C executable bc2h /usr/bin/cmake -E cmake_link_script CMakeFiles/bc2h.dir/link.txt --verbose=1 [ 9%] Built target reloc2 [ 9%] Built target reloc1 /home/cb88/rocm_sdk_builder/src_projects/llvm-project/amd/comgr/test/source/square.hip:23:10: fatal error: 'hip/hip_runtime.h' file not found 23 | #include "hip/hip_runtime.h" | ^~~~~~~~~~~~~~~~~~~ /home/cb88/rocm_sdk_builder/src_projects/llvm-project/amd/comgr/test/source/double.hip:23:10: fatal error: 'hip/hip_runtime.h' file not found /home/cb88/rocm_sdk_builder/src_projects/llvm-project/amd/comgr/test/source/cube.hip:23:10: 23fatal error: | #i'hip/hip_runtime.h' file not foundnc lude " h23i | p#/ihnicpl_u/usr/bin/cc -I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include -O3 -DNDEBUG -L/opt/rocm_sdk_612/lib64 -L/opt/rocm_sdk_612/lib -L/opt/rocm_sdk_612/hsa/lib -L/opt/rocm_sdk_612/rocblas/lib -L/opt/rocm_sdk_612/hcc/lib -Wl,--dependency-file=CMakeFiles/bc2h.dir/link.d CMakeFiles/bc2h.dir/bc2h.c.o -o bc2h rduen t"ihmiep./hh"ip _ r| un ^~~~~~~~~~~~~~~~~~~t ime.h" | ^~~~~~~~~~~~~~~~~~~ 1 error generated when compiling for gfx900. 1 error generated when compiling for gfx900. 1 error generated when compiling for gfx900. make[2]: *** [test/CMakeFiles/square.dir/build.make:75: test/source/square.bc] Error 1 make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: *** [test/CMakeFiles/cube.dir/build.make:75: test/source/cube.bc] Error 1 make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[1]: *** [CMakeFiles/Makefile2:5341: test/CMakeFiles/square.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: *** [test/CMakeFiles/double.dir/build.make:75: test/source/double.bc] Error 1 make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[1]: *** [CMakeFiles/Makefile2:5309: test/CMakeFiles/cube.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:5373: test/CMakeFiles/double.dir/all] Error 2 [ 9%] Built target shared-debug [ 9%] Built target shared make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Built target bc2h make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Built target source1 make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Built target opencl1.2-c.pch_target [ 9%] Built target opencl2.0-c.pch_target make[1]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make: *** [Makefile:166: all] Error 2 build failed: ROCm-CompilerSupport
I really do not have fix for this for now. I will plan to start updating the rocm base stuff soon to never version, I hope that will then help. I will just put out first the boost update to 1.87.0 and some xdna/npu stuff that I managed to get working.
Seems to blow up much later now after updating... as well as reinstalling some of my system packages.
LIST_BINFO_FILE_FULLNAME[78]: /home/cb88/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo
APP_INFO_FULL_NAME: /home/cb88/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo
---------------------------
[78] BINFO_APP_NAME: AMDMIGraphX
BINFO FILE: /home/cb88/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo
BINFO_APP_SRC_SUBDIR_BASENAME:
BINFO_APP_SRC_TOPDIR_BASENAME: AMDMIGraphX
BINFO_APP_SRC_DIR: /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
BINFO_APP_SRC_CLONE_DIR: /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
BINFO_APP_BUILD_DIR: /home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX/.result_install
---------------------------
SHELL=/usr/bin/bash
SDK_CXX_COMPILER_HIP_CLANG=/opt/rocm_sdk_612/bin/clang++
CCACHE_TEMPDIR=/home/cb88/.ccache
CMAKE_BUILD_TYPE_RELWITHDEBINFO=RelWithDebInfo
PYENV_SHELL=bash
INSTALL_DIR_PREFIX_SDK_ROOT=/opt/rocm_sdk_612
CPPFLAGS_DEFAULT=-I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include
PKG_CONFIG_PATH={INSTALL_DIR_PREFIX_SDK_ROOT}/lib64/pkgconfig:{INSTALL_DIR_PREFIX_SDK_ROOT}/lib/pkgconfig:{INSTALL_DIR_PREFIX_SDK_ROOT}/share/pkgconfig
HCC_HOME=/opt/rocm_sdk_612/hcc
UPSTREAM_REPO_VERSION_TAG_DEFAULT=rocm-6.1.2
HIPCC_VERBOSE=7
APP_CMAKE_CFG_FLAGS_DEFAULT=-DCMAKE_INSTALL_LIBDIR=lib64
ROCM_MINOR_VERSION=1
SDK_C_COMPILER_HIPCC=/opt/rocm_sdk_612/bin/hipcc
ROCM_MAJOR_VERSION=6
PWD=/home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX
LOGNAME=cb88
XDG_SESSION_TYPE=tty
CCACHE_DIR=/home/cb88/.ccache
CMAKE_BUILD_TYPE_DEFAULT=Release
ROCM_VERSION_NMBR=60102
BUILD_CPU_COUNT_DEFAULT=8
ROCM_VERSION_STR_ZEROED_NO_DOTS=60102
CMAKE_BUILD_TYPE_RELEASE=Release
MOTD_SHOWN=pam
SDK_SRC_PYTHON_WHEEL_BACKUP_DIR=/home/cb88/rocm_sdk_builder/packages/whl
LDFLAGS=-L/opt/rocm_sdk_612/lib64 -L/opt/rocm_sdk_612/lib -L/opt/rocm_sdk_612/hsa/lib -L/opt/rocm_sdk_612/rocblas/lib -L/opt/rocm_sdk_612/hcc/lib
HOME=/home/cb88
ROCM_PYTHON_VERSION=v3.11.11
TRITON_HIP_LLD_PATH=/opt/rocm_sdk_612/bin/ld.lld
LANG=en_US.UTF-8
ROCM_LIBPATCH_VERSION=60102
INSTALL_DIR_PREFIX_SDK_AI_MODELS=/opt/rocm_sdk_models
HIP_PATH=/opt/rocm_sdk_612
BUILD_CPU_COUNT_MIN=1
python=python
CPPFLAGS=-I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include
SDK_C_COMPILER_DEFAULT=/opt/rocm_sdk_612/bin/hipcc
BINFO_GPU_TARGET_COUNT_DEFAULT=1
INSTALL_DIR_PREFIX_HIP_LLVM=/opt/rocm_sdk_612
HIP_PLATFORM=amd
HIP_PLATFORM_DEFAULT=amd
XDG_SESSION_CLASS=user
INSTALL_DIR_PREFIX_C_COMPILER=/opt/rocm_sdk_612
TERM=xterm
BABS_VERSION=2025_01_27_01
BUILD_CPU_COUNT_MAX=48
ROCM_DIR=/opt/rocm_sdk_612
CMAKE_BUILD_TYPE_DEBUG=Debug
USER=cb88
SDK_PLATFORM_NAME_HIPCLANG=clang
ROCM_PATCH_VERSION=2
DEVICE_LIB_PATH=/opt/rocm_sdk_612/amdgcn/bitcode
ROCBLAS_HOME=/opt/rocm_sdk_612/rocblas
INSTALL_DIR_PREFIX_HIPCC=/opt/rocm_sdk_612
SDK_CXX_COMPILER_DEFAULT=/opt/rocm_sdk_612/bin/hipcc
ROCM_TARGET_TRIPLED=x86_64-rocm-linux-gnu
DISPLAY=localhost:1.0
SHLVL=2
MAX_JOBS=8
ROCM_SDK_BUILDER_SRC_REV=a6e83969
BUILD_CPU_COUNT_SAFE=8
BINFO_BUILD_CPU_COUNT=8
XDG_SESSION_ID=37
HIP_PATH_DEFAULT=/opt/rocm_sdk_612
ROCM_SDK_VERSION_INFO=rocm-6.1.2
ROCM_PATH=/opt/rocm_sdk_612
LD_LIBRARY_PATH=/opt/rocm_sdk_612/hcc/lib:/opt/rocm_sdk_612/rocblas/lib:/opt/rocm_sdk_612/hsa/lib:/opt/rocm_sdk_612/lib:/opt/rocm_sdk_612/lib64:/lib64:/opt/rocm_sdk_612/lib64:/opt/rocm_sdk_612/lib:/opt/rocm_sdk_612/hsa/lib
PATCH_FILE_ROOT_DIR=/home/cb88/rocm_sdk_builder/patches/rocm-6.1.2
XDG_RUNTIME_DIR=/run/user/1000
HCC_PATH=/opt/rocm_sdk_612/hcc/bin
PYENV_ROOT=/home/cb88/.pyenv
DEBUGINFOD_URLS=https://debuginfod.archlinux.org
BUILD_SCRIPT_ROOT_DIR=/home/cb88/rocm_sdk_builder/build
SDK_SRC_ROOT_DIR=/home/cb88/rocm_sdk_builder/src_projects
CPACK_RPM_PACKAGE_RELEASE=01
SDK_CXX_COMPILER_HIPCC=/opt/rocm_sdk_612/bin/hipcc
BUILD_ROOT_DIR=/home/cb88/rocm_sdk_builder/builddir
XDG_DATA_DIRS=/home/cb88/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
SDK_C_COMPILER_HIP_CLANG=/opt/rocm_sdk_612/bin/clang
PATH=/opt/rocm_sdk_612/bin:/opt/rocm_sdk_612/hcc/bin:/opt/rocm_sdk_612/bin:/home/cb88/.pyenv/shims:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/opt/rocm/bin:/usr/lib/rustup/bin
CFLAGS=-I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include
SDK_PLATFORM_NAME_HIPCC=amd
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
MAIL=/var/spool/mail/cb88
SSH_TTY=/dev/pts/1
BUILD_CPU_COUNT_MODERATE=8
INSTALL_DIR_PREFIX_HIP_CLANG=/opt/rocm_sdk_612
ROCM_SDK_RELEASE_VERSION=1
ROCM_VERSION_STR=6.1.2
APP_CMAKE_CFG_FLAGS_DEBUG=-DCMAKE_C_FLAGS_DEBUG=-g3 -DCMAKE_CXX_FLAGS_DEBUG=-g3
OLDPWD=/home/cb88/rocm_sdk_builder/builddir/033_02_composable_kernel_jit
BUILD_RULE_ROOT_DIR=/home/cb88/rocm_sdk_builder/binfo
_=/usr/bin/env
/home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX
/home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX
[78] Configuration: AMDMIGraphX
BINFO_APP_CMAKE_CFG: -DCMAKE_INSTALL_PREFIX=/opt/rocm_sdk_612 -DCMAKE_PREFIX_PATH=/opt/rocm_sdk_612/lib64/cmake;/opt/rocm_sdk_612/lib/cmake -DGPU_TARGETS=gfx906 -DHALF_INCLUDE_DIR=/opt/rocm_sdk_612/include -DCMAKE_INCLUDE_PATH=/opt/rocm_sdk_612/include -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang++ -DMIGRAPHX_USE_HIPRTC=ON -DMIGRAPHX_ENABLE_PYTHON=ON -DMIGRAPHX_ENABLE_GPU=ON /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
BINFO_APP_CMAKE_CFG: -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib64 -DCMAKE_INSTALL_PREFIX=/opt/rocm_sdk_612 -DCMAKE_PREFIX_PATH=/opt/rocm_sdk_612/lib64/cmake;/opt/rocm_sdk_612/lib/cmake -DGPU_TARGETS=gfx906 -DHALF_INCLUDE_DIR=/opt/rocm_sdk_612/include -DCMAKE_INCLUDE_PATH=/opt/rocm_sdk_612/include -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang++ -DMIGRAPHX_USE_HIPRTC=ON -DMIGRAPHX_ENABLE_PYTHON=ON -DMIGRAPHX_ENABLE_GPU=ON /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
Configuring AMDMIGraphX
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib64 -DCMAKE_INSTALL_PREFIX=/opt/rocm_sdk_612 -DCMAKE_PREFIX_PATH=/opt/rocm_sdk_612/lib64/cmake;/opt/rocm_sdk_612/lib/cmake -DGPU_TARGETS=gfx906 -DHALF_INCLUDE_DIR=/opt/rocm_sdk_612/include -DCMAKE_INCLUDE_PATH=/opt/rocm_sdk_612/include -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang++ -DMIGRAPHX_USE_HIPRTC=ON -DMIGRAPHX_ENABLE_PYTHON=ON -DMIGRAPHX_ENABLE_GPU=ON /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
-- half.hpp is at /opt/rocm_sdk_612/include
-- Enable gpu backend
-- Clang tidy found: 18.0.0git
-- Clang tidy checks: boost-*,bugprone-*,cert-*,clang-analyzer-*,clang-diagnostic-*,cppcoreguidelines-*,google-*,hicpp-multiway-paths-covered,hicpp-signed-bitwise,llvm-namespace-comment,misc-*,-misc-confusable-identifiers,-misc-use-anonymous-namespace,modernize-*,performance-*,readability-*,-bugprone-easily-swappable-parameters,-bugprone-implicit-widening-of-multiplication-result,-bugprone-macro-parentheses,-bugprone-signed-char-misuse,-bugprone-unchecked-optional-access,-cert-dcl37-c,-cert-dcl51-cpp,-cert-err33-c,-cert-str34-c,-clang-analyzer-alpha*,clang-analyzer-alpha.core.CallAndMessageUnInitRefArg,clang-analyzer-alpha.core.Conversion,clang-analyzer-alpha.core.IdenticalExpr,clang-analyzer-alpha.core.PointerArithm,clang-analyzer-alpha.core.PointerSub,clang-analyzer-alpha.core.TestAfterDivZero,clang-analyzer-alpha.cplusplus.InvalidIterator,clang-analyzer-alpha.cplusplus.IteratorRange,clang-analyzer-alpha.cplusplus.MismatchedIterator,clang-analyzer-alpha.cplusplus.MisusedMovedObject,-clang-analyzer-optin.performance.Padding,-clang-diagnostic-deprecated-declarations,-clang-diagnostic-extern-c-compat,-clang-diagnostic-disabled-macro-expansion,-clang-diagnostic-unused-command-line-argument,-cppcoreguidelines-avoid-do-while,-cppcoreguidelines-avoid-const-or-ref-data-members,-cppcoreguidelines-explicit-virtual-functions,-cppcoreguidelines-init-variables,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-constant-array-index,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-member-init,-cppcoreguidelines-pro-type-reinterpret-cast,-cppcoreguidelines-pro-type-union-access,-cppcoreguidelines-pro-type-vararg,-cppcoreguidelines-special-member-functions,-cppcoreguidelines-virtual-class-destructor,-cppcoreguidelines-avoid-capture-default-when-capturing-this,-cppcoreguidelines-rvalue-reference-param-not-moved,-google-readability-*,-google-runtime-int,-google-runtime-references,-misc-macro-parentheses,-misc-no-recursion,-modernize-concat-nested-namespaces,-modernize-pass-by-value,-modernize-use-default-member-init,-modernize-use-nodiscard,-modernize-use-override,-modernize-use-trailing-return-type,-modernize-use-transparent-functors,-performance-type-promotion-in-math-fn,-readability-braces-around-statements,-readability-convert-member-functions-to-static,-readability-else-after-return,-readability-function-cognitive-complexity,-readability-identifier-length,-readability-named-parameter,-readability-redundant-string-init,-readability-suspicious-call-argument,-readability-uppercase-literal-suffix,-*-avoid-c-arrays,-*-explicit-constructor,-*-magic-numbers,-*-narrowing-conversions,-*-non-private-member-variables-in-classes,-*-use-auto,-*-use-emplace,-*-use-equals-default
-- Cppcheck found: 2.16.0
-- Parallel STL disabled
CMake Warning (dev) at /usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package):
Policy CMP0167 is not set: The FindBoost module is removed. Run "cmake
--help-policy CMP0167" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
Call Stack (most recent call first):
/usr/lib/cmake/msgpack-cxx/msgpack-cxx-config.cmake:40 (find_dependency)
src/CMakeLists.txt:305 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found pybind11: /usr/include (found version "2.13.6")
-- Python 3.5 not found.
-- Python 3.6 not found.
-- Python 3.7 not found.
-- Python 3.8 not found.
-- Python 3.9 not found.
pyenv: python3.10-config: command not found
The `python3.10-config' command exists in these Python versions:
3.10.15
Note: See 'pyenv help global' for tips on allowing both
python2 and python3 to be found.
CMake Error at cmake/PythonModules.cmake:31 (message):
Process failed:
COMMAND;/home/cb88/.pyenv/shims/python3.10-config;--includes;OUTPUT_VARIABLE;_python_include_args
Call Stack (most recent call first):
cmake/PythonModules.cmake:40 (py_exec)
cmake/PythonModules.cmake:97 (find_python)
src/py/CMakeLists.txt:31 (include)
-- Configuring incomplete, errors occurred!
configure failed: AMDMIGraphX
Note python3.10-config does exist not sure why this fails?
Tried building vllm at this point.
Note I did install setuptools_scm package via pacman -S python-setuptools-scm
[cb88@M31-AR0 rocm_sdk_builder]$ ./babs.sh -b binfo/extra/ai_tools.blist
ROCM_TARGET_TRIPLED: x86_64-rocm-linux-gnu
ROCM_PYTHON_VERSION: v3.11.11
INSTALL_DIR_PREFIX_SDK_ROOT: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_SDK_AI_MODELS: /opt/rocm_sdk_models
selected GPUs: gfx906
build
SDK_CXX_COMPILER_DEFAULT: /opt/rocm_sdk_612/bin/hipcc
HIP_PLATFORM_DEFAULT: amd
HIP_PLATFORM: hcc
HIP_PATH: /opt/rocm_sdk_612
SDK_ROOT_DIR: /home/cb88/rocm_sdk_builder
SDK_SRC_ROOT_DIR: /home/cb88/rocm_sdk_builder/src_projects
BUILD_RULE_ROOT_DIR: /home/cb88/rocm_sdk_builder/binfo
PATCH_FILE_ROOT_DIR: /home/cb88/rocm_sdk_builder/patches/rocm-6.1.2
BUILD_ROOT_DIR: /home/cb88/rocm_sdk_builder/builddir
INSTALL_DIR_PREFIX_SDK_ROOT: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_HIPCC: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_HIP_CLANG: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_C_COMPILER: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_HIP_LLVM: /opt/rocm_sdk_612
SPACE_SEPARATED_GPU_TARGET_LIST_DEFAULT: gfx906
SEMICOLON_SEPARATED_GPU_TARGET_LIST_DEFAULT: gfx906
LF_SEPARATED_GPU_TARGET_LIST_DEFAULT: gfx906
HIP_PATH_DEFAULT: /opt/rocm_sdk_612
SDK_ROOT_DIR: /home/cb88/rocm_sdk_builder
APP_INFO_FULL_NAME: binfo/extra/ai_tools_dependencies.binfo
---------------------------
[1] BINFO_APP_NAME: pytorch_dependencies
BINFO FILE: binfo/extra/ai_tools_dependencies.binfo
BINFO_APP_SRC_SUBDIR_BASENAME:
BINFO_APP_SRC_TOPDIR_BASENAME: pytorch_dependencies
BINFO_APP_SRC_DIR: /home/cb88/rocm_sdk_builder/src_projects/pytorch_dependencies
BINFO_APP_SRC_CLONE_DIR: /home/cb88/rocm_sdk_builder/src_projects/pytorch_dependencies
BINFO_APP_BUILD_DIR: /home/cb88/rocm_sdk_builder/builddir/ai_tools_dependencies
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/cb88/rocm_sdk_builder/builddir/ai_tools_dependencies/.result_install
---------------------------
APP_INFO_FULL_NAME: binfo/extra/vllm.binfo
---------------------------
[2] BINFO_APP_NAME: vllm
BINFO FILE: binfo/extra/vllm.binfo
BINFO_APP_SRC_SUBDIR_BASENAME:
BINFO_APP_SRC_TOPDIR_BASENAME: vllm
BINFO_APP_SRC_DIR: /home/cb88/rocm_sdk_builder/src_projects/vllm
BINFO_APP_SRC_CLONE_DIR: /home/cb88/rocm_sdk_builder/src_projects/vllm
BINFO_APP_BUILD_DIR: /home/cb88/rocm_sdk_builder/builddir/vllm
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/cb88/rocm_sdk_builder/builddir/vllm/.result_install
---------------------------
/home/cb88/rocm_sdk_builder/builddir/vllm
[2] Building: vllm
[0] vllm, build command:
cd /home/cb88/rocm_sdk_builder/src_projects/vllm
[1] vllm, build command:
./build_rocm.sh /opt/rocm_sdk_612 gfx906
using rocm_root_directory specified: /opt/rocm_sdk_612
Using specified amd rocm gpu: gfx906
Traceback (most recent call last):
File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 16, in <module>
from setuptools_scm import get_version
ModuleNotFoundError: No module named 'setuptools_scm'
build failed: vllm
error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 gfx906
[cb88@M31-AR0 rocm_sdk_builder]$
you could probably do the
source /opt/rocm_sdk_612/bin/env_rocm.sh pip install setuptools_scm
But vllm may require the pytorch. I still have not had time to setup the arch linux. Have kind of storage problem as I am now doing 3 docker builds also. (one for cdna, one for rnda1/2 and one for rdna3 cards)
I was able to build pytorch acutally, but not torch vision or audio.
I also built the older tensorflow verison
Went back and tried to build vllm again (after running pip isntlal setuptools_scm)
and got this
/home/cb88/rocm_sdk_builder/builddir/vllm
[2] Building: vllm
[0] vllm, build command:
cd /home/cb88/rocm_sdk_builder/src_projects/vllm
[1] vllm, build command:
./build_rocm.sh /opt/rocm_sdk_612 gfx906
using rocm_root_directory specified: /opt/rocm_sdk_612
Using specified amd rocm gpu: gfx906
No ROCm runtime is found, using ROCM_HOME='/opt/rocm_sdk_612'
Traceback (most recent call last):
File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 631, in
Having another go at building this on Arch.
llvm build failed to detect pfmlib.h
I installed libpfm via pacman and it seems to confinue building.
Fails here now.
CCLD libucs.la
/opt/rocm_sdk_612/bin/ld: /opt/rocm_sdk_612/lib/libbfd.a(elf64.o): warning: relocation against `bfd_elf64_swap_reloca_out' in read-only section `.text'
/opt/rocm_sdk_612/bin/ld: /opt/rocm_sdk_612/lib/libbfd.a(bfd.o): relocation R_X86_64_PC32 against symbol `_bfd_error_buf' can not be used when making a shared object; recompile with -fPIC
/opt/rocm_sdk_612/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make[3]: *** [Makefile:1290: libucs.la] Error 1
make[3]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/015_01_ucx_openmpi/src/ucs'
make[2]: *** [Makefile:2086: all-recursive] Error 1
make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/015_01_ucx_openmpi/src/ucs'
make[1]: *** [Makefile:802: all-recursive] Error 1
make[1]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/015_01_ucx_openmpi'
make: *** [Makefile:664: all] Error 2
build failed: ucx