rocm_sdk_builder icon indicating copy to clipboard operation
rocm_sdk_builder copied to clipboard

build failure unable to find library -lhsakmt

Open cb88 opened this issue 1 year ago • 25 comments

ld.lld: error: unable to find library -lhsakmt make[2]: Leaving directory '/home/user/rocm_sdk_builder/builddir/016_03_llvm_project_openmp' make[2]: Leaving directory '/home/user/rocm_sdk_builder/builddir/016_03_llvm_project_openmp' [ 51%] Built target Utils.cpp-gfx906.bc clang++: error: linker command failed with exit code 1 (use -v to see invocation)

This is after ./babs.sh -up ./babs.sh --clean ./babs.sh -b

git rev 84faa05 I was attempting to test on my MI60 but haven't been able to get a clean build on ArchLinux.

cb88 avatar Dec 17 '24 17:12 cb88

Have you been earlier able to build the "016_03_llvm_project_openmp" project. I know that some people have used arch linux earlier. Do you have multiple versions of it if you do:

cd /opt
find -name libhsakmt.so

I have

./rocm_sdk_612/lib64/libhsakmt.so
./rocm_sdk_612/lib/libhsakmt.so

All libhsak* versions in lib-directory are symlinks to lib64.

ls -la /opt/rocm_sdk_612/lib/libhsakmt.*
lrwxrwxrwx 1 lamikr lamikr 35 Nov 12 00:13 /opt/rocm_sdk_612/lib/libhsakmt.a -> /opt/rocm_sdk_612/lib64/libhsakmt.a
lrwxrwxrwx 1 lamikr lamikr 36 Nov 12 00:12 /opt/rocm_sdk_612/lib/libhsakmt.so -> /opt/rocm_sdk_612/lib/libhsakmt.so.1*
lrwxrwxrwx 1 lamikr lamikr 40 Nov 12 00:12 /opt/rocm_sdk_612/lib/libhsakmt.so.1 -> /opt/rocm_sdk_612/lib/libhsakmt.so.1.0.6*
lrwxrwxrwx 1 lamikr lamikr 42 Nov 12 00:12 /opt/rocm_sdk_612/lib/libhsakmt.so.1.0.6 -> /opt/rocm_sdk_612/lib64/libhsakmt.so.1.0.6*

lamikr avatar Dec 18 '24 02:12 lamikr

And let's check that all ldd dependencies are found. What does this show:

ldd /opt/rocm_sdk_612/lib64/libhsakmt.so.1.0.6

lamikr avatar Dec 18 '24 02:12 lamikr

After the recent posts in the other ticket its building and appears to be progressing further... I will update here with the results once it completes or not.

cb88 avatar Dec 18 '24 02:12 cb88

Thanks for letting know, it would be nice to know what caused that break. So you have Vega VII to test the gfx906?

lamikr avatar Dec 18 '24 05:12 lamikr

I have 2x MI60 (or 32GB MI50 whichever it really is) The build stopped awhile ago, and I reran ./babs.sh -b and it failed here

adding 'torchvision-0.20.0a0+324eea9.dist-info/LICENSE' adding 'torchvision-0.20.0a0+324eea9.dist-info/METADATA' adding 'torchvision-0.20.0a0+324eea9.dist-info/WHEEL' adding 'torchvision-0.20.0a0+324eea9.dist-info/top_level.txt' adding 'torchvision-0.20.0a0+324eea9.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel corrupted size vs. prev_size in fastbins ./build_rocm.sh: line 18: 3102616 Aborted (core dumped) ROCM_PATH=${install_dir_prefix_rocm} FORCE_CUDA=1 TORCHVISION_USE_NVJPEG=0 TORCHVISION_USE_VIDEO_CODEC=0 CC=${CMAKE_C_COMPILER} CXX=${CMAKE_CXX_COMPILER} python setup.py bdist_wheel build failed: pytorch_vision error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612

cb88 avatar Dec 18 '24 06:12 cb88

Hmm... Not really sure what is going on. In theory the benchmark should now be able to run some pytorch tests as it has now passed that and is trying now to build pytorch vision.

So are you able to test with

source /opt/rocm_sdk_612/bin/env_rocm.sh
cd /opt/rocm_sdk_612/benchmarks
./run_and_save_benchmarks.sh

If you are in master branch, can you do one more time these commands to verify everything is up to date and then restart pytorch vision build from clean.

./babs.sh -up
./babs.sh -ca
./babs.sh --clean binfo/core/039_03_pytorch_vision.binfo
./babs.sh -b

I have started my self clean build on fedora 40 with gfx906 as an only target. But I need to wait until morning to see the results.

lamikr avatar Dec 18 '24 08:12 lamikr

[cb88@M31-AR0 ~]$ cat /opt/rocm_sdk_612/benchmarks/bench.txt Timestamp for benchmark results: 20241218_133446 Saving to file: 20241218_133446_cpu_vs_gpu_simple.txt Benchmarking CPU and GPUs Pytorch version: 2.4.1 ROCM HIP version: 6.1.40093-de7055040 Device: AMD EPYC 7352 24-Core Processor 'CPU time: 35.332 sec Device: AMD Radeon Graphics 'GPU time: 0.604 sec Benchmark ready

Saving to file: 20241218_133446_pytorch_dot_products.txt Pytorch version: 2.4.1 dot product calculation test tensor([[[ 0.8124, 0.2179, -0.4919, -0.4980, -0.6716, 1.2153, -0.0119, -0.9560], [-0.7172, 0.4881, 0.9783, -0.3172, -0.0765, 1.5946, -0.1057, 0.1876], [ 0.8850, 0.3325, -0.6169, -0.5590, -0.7152, 1.3886, -0.0615, -1.1245]],

    [[ 0.2982, -0.1511,  0.2687, -0.8882,  0.1656,  0.1409, -1.0829,
       0.6578],
     [-0.2719,  0.9328, -0.8428, -0.5765, -0.2355,  0.1816, -0.3346,
      -0.5164],
     [ 0.8432,  0.4674, -0.1435,  0.2439, -0.3148,  1.1532, -0.3879,
      -0.1294]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends Device: AMD Radeon Graphics / cuda:0 Default benchmark: 3205.060 microseconds, 0.0032050598703790454 sec SDPBackend.MATH benchmark: 3212.746 microseconds, 0.0032127462700009346 sec SDPBackend.FLASH_ATTENTION benchmark: SDPBackend.FLASH_ATTENTION cuda:0 is not supported. See warnings for reasons. SDPBackend.EFFICIENT_ATTENTION benchmark: SDPBackend.EFFICIENT_ATTENTION cuda:0 is not supported. See warnings for reasons. Device: AMD EPYC 7352 24-Core Processor / cpu Default benchmark: 3844997.412 microseconds, 3.844997411943041 sec SDPBackend.MATH benchmark: 3642490.409 microseconds, 3.6424904089653865 sec SDPBackend.FLASH_ATTENTION benchmark: 3828689.283 microseconds, 3.8286892829928547 sec SDPBackend.EFFICIENT_ATTENTION benchmark: SDPBackend.EFFICIENT_ATTENTION cpu is not supported. See warnings for reasons. Summary

Pytorch version: 2.4.1 ROCM HIP version: 6.1.40093-de7055040 CPU: AMD EPYC 7352 24-Core Processor Problem parameters: Sequence-length: 512 Batch-size: 32 Heads: 16 Embed_dimension: 16 Datatype: torch.float16 Device: AMD Radeon Graphics / cuda:0 Default: 3205.060 ms SDPBackend.MATH: 3212.746 ms SDPBackend.FLASH_ATTENTION: -1.000 ms SDPBackend.EFFICIENT_ATTENTION: -1.000 ms

Device: AMD EPYC 7352 24-Core Processor / cpu Default: 3844997.412 ms SDPBackend.MATH: 3642490.409 ms SDPBackend.FLASH_ATTENTION: 3828689.283 ms SDPBackend.EFFICIENT_ATTENTION: -1.000 ms

cb88 avatar Dec 18 '24 18:12 cb88

[cb88@M31-AR0 opt]$ find -name libhsakmt.so ./rocm_sdk_612/lib64/libhsakmt.so ./rocm_sdk_612/lib/libhsakmt.so ./rocm/lib/libhsakmt.so

/opt/rocm is binary install from Arch.

[cb88@M31-AR0 opt]$ ls -la /opt/rocm_sdk_612/lib/libhsakmt.* lrwxrwxrwx 1 cb88 cb88 35 Dec 11 12:28 /opt/rocm_sdk_612/lib/libhsakmt.a -> /opt/rocm_sdk_612/lib64/libhsakmt.a lrwxrwxrwx 1 cb88 cb88 36 Dec 11 12:28 /opt/rocm_sdk_612/lib/libhsakmt.so -> /opt/rocm_sdk_612/lib/libhsakmt.so.1 lrwxrwxrwx 1 cb88 cb88 40 Dec 11 12:28 /opt/rocm_sdk_612/lib/libhsakmt.so.1 -> /opt/rocm_sdk_612/lib/libhsakmt.so.1.0.6 lrwxrwxrwx 1 cb88 cb88 42 Dec 11 12:28 /opt/rocm_sdk_612/lib/libhsakmt.so.1.0.6 -> /opt/rocm_sdk_612/lib64/libhsakmt.so.1.0.6

cb88 avatar Dec 18 '24 18:12 cb88

So it seems that the original problem with the missing symbol in rocBLAS is solved also for you now and pytorch is able to use the rocBLAS when using MATH backend. LLama.cpp that was also earlier failing for Said-akbar should probably also now work ok if you try to build it with

./babs.sh -ca binfo/extra/ai_tools.blist
./babs.sh -b binfo/extra/ai_tools.blist

and then run

cd /opt/rocm_sdk_612/docs/examples/llm/llama_cpp/
./run_llama_benchmark.sh

There is still this second problem with the flash-attention that needs to be solved. And at the moment I do not have any idea why the pytorch vision build fails for you.

lamikr avatar Dec 18 '24 19:12 lamikr

./babs.sh -b binfo/extra/ai_tools.blist ran for a bit then...

-- Found Python: /opt/rocm_sdk_612/bin/python (found version "3.11.9") found components: Interpreter Development.Module Development.SABIModule -- Found python matching: /opt/rocm_sdk_612/bin/python. CMake Error at cmake/utils.cmake:37 (message): Failed to locate torch path: corrupted size vs. prev_size in fastbins

Call Stack (most recent call first): cmake/utils.cmake:45 (run_python) CMakeLists.txt:70 (append_cmake_prefix_path)

-- Configuring incomplete, errors occurred! Traceback (most recent call last): File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 483, in setup( File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/init.py", line 87, in setup return distutils.core.setup(**attrs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) ^^^^^^^^^^^^^^^^^^ File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands self.run_command(cmd) File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command super().run_command(command) File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command cmd_obj.run() File "/opt/rocm_sdk_612/lib/python3.11/site-packages/wheel/_bdist_wheel.py", line 387, in run self.run_command("build") File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command self.distribution.run_command(command) File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command super().run_command(command) File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command cmd_obj.run() File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run self.run_command(cmd_name) File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command self.distribution.run_command(command) File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command super().run_command(command) File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command cmd_obj.run() File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 235, in run super().run() File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/opt/rocm_sdk_612/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run self.build_extensions() File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 197, in build_extensions self.configure(ext) File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 177, in configure subprocess.check_call( File "/opt/rocm_sdk_612/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['cmake', '/home/cb88/rocm_sdk_builder/src_projects/vllm', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=rocm', '-DVLLM_PYTHON_EXECUTABLE=/opt/rocm_sdk_612/bin/python', '-DVLLM_PYTHON_PATH=/home/cb88/rocm_sdk_builder/src_projects/vllm:/opt/rocm_sdk_612/lib/python311.zip:/opt/rocm_sdk_612/lib/python3.11:/opt/rocm_sdk_612/lib/python3.11/lib-dynload:/opt/rocm_sdk_612/lib/python3.11/site-packages', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=8']' returned non-zero exit status 1. corrupted size vs. prev_size in fastbins ./build_rocm.sh: line 22: 3139439 Aborted (core dumped) python setup.py bdist_wheel build failed: vllm error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 gfx906

cb88 avatar Dec 18 '24 19:12 cb88

Hmm, vllm build that is before the llama.cpp build seems to fail for similar type error "corrupted size vs. prev_size in fastbins" than pytorch vision.

How about if you just build the llama.cpp

./babs.sh -b binfo/extra/llama_cpp.binfo

lamikr avatar Dec 18 '24 21:12 lamikr

Lllama builds and runs sucessfully

cb88 avatar Dec 18 '24 22:12 cb88

Something is still not right with it though... llama_kv_cache_init: ROCm0 KV buffer size = 4000.00 MiB ggml_cuda_host_malloc: failed to allocate 156000.00 MiB of pinned memory: out of memory ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 163577856032 llama_kv_cache_init: failed to allocate buffer for kv cache llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache llama_init_from_gpt_params: failed to create context with model (small model here koboldcpp can load fully in VRAM) main: error: unable to load model IT didnt matter if I passed -ngl 1 or 999 it still tried to allocate a huge buffer and failed.

cb88 avatar Dec 18 '24 22:12 cb88

I noticed from your buildlog that vllm has been configured with RelWithDebInfo. I am wondering have you build whole rocm sdk stack on that mode instead of Release? (can be set in envsetup_user.sh)

If all binaries grow very big, that could perhaps explain the "corrupted size vs. prev_size in fastbins" problem you are seeing? I usually build myself everything first in release mode, and then only rebuild some individual libraries that I need to debug with gdb by enabling the debug build option and then just rebuild those.

Google found some similar types of errors related to process count/running out of memory situations.

Here is anyway my vllm log from same part that fails on your build:

-- Target device: rocm
-- Found Python: /opt/rocm_sdk_612/bin/python (found version "3.11.9") found components: Interpreter Development.Module Development.SABIModule 
-- Found python matching: /opt/rocm_sdk_612/bin/python.
Building PyTorch for GPU arch: gfx1035
-- Found HIP: /opt/rocm_sdk_612 (found suitable version "6.1.40093-9e5ee4609", minimum required is "1.0") 
HIP VERSION: 6.1.40093-9e5ee4609
-- Caffe2: Header version is: 6.1.2

***** ROCm version from rocm_version.h ****

ROCM_VERSION_DEV: 6.1.2
ROCM_VERSION_DEV_MAJOR: 6
ROCM_VERSION_DEV_MINOR: 1
ROCM_VERSION_DEV_PATCH: 2
ROCM_VERSION_DEV_INT:   60102
HIP_VERSION_MAJOR: 6
HIP_VERSION_MINOR: 1
TORCH_HIP_VERSION: 601


lamikr avatar Dec 19 '24 06:12 lamikr

If you have time would you be able to try to do a totally new build with latest code to check if same errors happen again? And do you have envsetup_user.sh?

These commands should clean up everything and then restart build. (I added the -ca command just incase to re-verify that all patches are applied to projects)

cd rocm_sdk_builder
sudo rm -rf /opt/rocm_sdk_612
rm -rf builddir
git checkout master
./babs.sh -up
./babs.sh -ca
./babs.sh -b

lamikr avatar Dec 19 '24 06:12 lamikr

I did not have a envsetup_user.sh so it was completely default.

cb88 avatar Dec 19 '24 16:12 cb88

Ok, then that does not explain the problem. Maybe I need to install arch linux by myself also and try it out. I think I will do it first to virtual machine.

Can you check whether you have for example in builddir/004_01_roct-thunk-interface_shared/CMakeCache.txt for CMAKE_BUILD_TYPE?

I have there: CMAKE_BUILD_TYPE:STRING=Release

lamikr avatar Dec 19 '24 20:12 lamikr

Crashed here yesterday when I tried to build I updated again today, and tried again same thing, ... I suspect the new version of cmake on arch?

[ 9%] Generating source/cube.bc cd /home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport/test && /opt/rocm_sdk_612/bin/clang-17 -c --offload-arch=gfx900 -emit-llvm -fgpu-rdc --gpu-bundle-output /home/cb88/rocm_sdk_builder/src_projects/llvm-project/amd/comgr/test/source/cube.hip -o source/cube.bc clang-17: warning: argument unused during compilation: '-nogpulib' [-Wunused-command-line-argument] make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Built target reloc-asm make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Linking C executable bc2h /usr/bin/cmake -E cmake_link_script CMakeFiles/bc2h.dir/link.txt --verbose=1 [ 9%] Built target reloc2 [ 9%] Built target reloc1 /home/cb88/rocm_sdk_builder/src_projects/llvm-project/amd/comgr/test/source/square.hip:23:10: fatal error: 'hip/hip_runtime.h' file not found 23 | #include "hip/hip_runtime.h" | ^~~~~~~~~~~~~~~~~~~ /home/cb88/rocm_sdk_builder/src_projects/llvm-project/amd/comgr/test/source/double.hip:23:10: fatal error: 'hip/hip_runtime.h' file not found /home/cb88/rocm_sdk_builder/src_projects/llvm-project/amd/comgr/test/source/cube.hip:23:10: 23fatal error: | #i'hip/hip_runtime.h' file not foundnc lude " h23i | p#/ihnicpl_u/usr/bin/cc -I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include -O3 -DNDEBUG -L/opt/rocm_sdk_612/lib64 -L/opt/rocm_sdk_612/lib -L/opt/rocm_sdk_612/hsa/lib -L/opt/rocm_sdk_612/rocblas/lib -L/opt/rocm_sdk_612/hcc/lib -Wl,--dependency-file=CMakeFiles/bc2h.dir/link.d CMakeFiles/bc2h.dir/bc2h.c.o -o bc2h rduen t"ihmiep./hh"ip _ r| un ^~~~~~~~~~~~~~~~~~~t ime.h" | ^~~~~~~~~~~~~~~~~~~ 1 error generated when compiling for gfx900. 1 error generated when compiling for gfx900. 1 error generated when compiling for gfx900. make[2]: *** [test/CMakeFiles/square.dir/build.make:75: test/source/square.bc] Error 1 make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: *** [test/CMakeFiles/cube.dir/build.make:75: test/source/cube.bc] Error 1 make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[1]: *** [CMakeFiles/Makefile2:5341: test/CMakeFiles/square.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs.... make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: *** [test/CMakeFiles/double.dir/build.make:75: test/source/double.bc] Error 1 make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[1]: *** [CMakeFiles/Makefile2:5309: test/CMakeFiles/cube.dir/all] Error 2 make[1]: *** [CMakeFiles/Makefile2:5373: test/CMakeFiles/double.dir/all] Error 2 [ 9%] Built target shared-debug [ 9%] Built target shared make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Built target bc2h make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Built target source1 make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' [ 9%] Built target opencl1.2-c.pch_target [ 9%] Built target opencl2.0-c.pch_target make[1]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/009_02_rocm-compilersupport' make: *** [Makefile:166: all] Error 2 build failed: ROCm-CompilerSupport

cb88 avatar Dec 21 '24 16:12 cb88

I really do not have fix for this for now. I will plan to start updating the rocm base stuff soon to never version, I hope that will then help. I will just put out first the boost update to 1.87.0 and some xdna/npu stuff that I managed to get working.

lamikr avatar Jan 10 '25 00:01 lamikr

Seems to blow up much later now after updating... as well as reinstalling some of my system packages.

LIST_BINFO_FILE_FULLNAME[78]: /home/cb88/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo
APP_INFO_FULL_NAME: /home/cb88/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo

---------------------------
[78] BINFO_APP_NAME: AMDMIGraphX
BINFO FILE: /home/cb88/rocm_sdk_builder/binfo/core/035_AMDMIGraphX.binfo
BINFO_APP_SRC_SUBDIR_BASENAME:
BINFO_APP_SRC_TOPDIR_BASENAME: AMDMIGraphX
BINFO_APP_SRC_DIR: /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
BINFO_APP_SRC_CLONE_DIR: /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
BINFO_APP_BUILD_DIR: /home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX/.result_install
---------------------------


SHELL=/usr/bin/bash
SDK_CXX_COMPILER_HIP_CLANG=/opt/rocm_sdk_612/bin/clang++
CCACHE_TEMPDIR=/home/cb88/.ccache
CMAKE_BUILD_TYPE_RELWITHDEBINFO=RelWithDebInfo
PYENV_SHELL=bash
INSTALL_DIR_PREFIX_SDK_ROOT=/opt/rocm_sdk_612
CPPFLAGS_DEFAULT=-I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include
PKG_CONFIG_PATH={INSTALL_DIR_PREFIX_SDK_ROOT}/lib64/pkgconfig:{INSTALL_DIR_PREFIX_SDK_ROOT}/lib/pkgconfig:{INSTALL_DIR_PREFIX_SDK_ROOT}/share/pkgconfig
HCC_HOME=/opt/rocm_sdk_612/hcc
UPSTREAM_REPO_VERSION_TAG_DEFAULT=rocm-6.1.2
HIPCC_VERBOSE=7
APP_CMAKE_CFG_FLAGS_DEFAULT=-DCMAKE_INSTALL_LIBDIR=lib64
ROCM_MINOR_VERSION=1
SDK_C_COMPILER_HIPCC=/opt/rocm_sdk_612/bin/hipcc
ROCM_MAJOR_VERSION=6
PWD=/home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX
LOGNAME=cb88
XDG_SESSION_TYPE=tty
CCACHE_DIR=/home/cb88/.ccache
CMAKE_BUILD_TYPE_DEFAULT=Release
ROCM_VERSION_NMBR=60102
BUILD_CPU_COUNT_DEFAULT=8
ROCM_VERSION_STR_ZEROED_NO_DOTS=60102
CMAKE_BUILD_TYPE_RELEASE=Release
MOTD_SHOWN=pam
SDK_SRC_PYTHON_WHEEL_BACKUP_DIR=/home/cb88/rocm_sdk_builder/packages/whl
LDFLAGS=-L/opt/rocm_sdk_612/lib64 -L/opt/rocm_sdk_612/lib -L/opt/rocm_sdk_612/hsa/lib -L/opt/rocm_sdk_612/rocblas/lib -L/opt/rocm_sdk_612/hcc/lib
HOME=/home/cb88
ROCM_PYTHON_VERSION=v3.11.11
TRITON_HIP_LLD_PATH=/opt/rocm_sdk_612/bin/ld.lld
LANG=en_US.UTF-8
ROCM_LIBPATCH_VERSION=60102
INSTALL_DIR_PREFIX_SDK_AI_MODELS=/opt/rocm_sdk_models
HIP_PATH=/opt/rocm_sdk_612
BUILD_CPU_COUNT_MIN=1
python=python
CPPFLAGS=-I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include
SDK_C_COMPILER_DEFAULT=/opt/rocm_sdk_612/bin/hipcc
BINFO_GPU_TARGET_COUNT_DEFAULT=1
INSTALL_DIR_PREFIX_HIP_LLVM=/opt/rocm_sdk_612
HIP_PLATFORM=amd
HIP_PLATFORM_DEFAULT=amd
XDG_SESSION_CLASS=user
INSTALL_DIR_PREFIX_C_COMPILER=/opt/rocm_sdk_612
TERM=xterm
BABS_VERSION=2025_01_27_01
BUILD_CPU_COUNT_MAX=48
ROCM_DIR=/opt/rocm_sdk_612
CMAKE_BUILD_TYPE_DEBUG=Debug
USER=cb88
SDK_PLATFORM_NAME_HIPCLANG=clang
ROCM_PATCH_VERSION=2
DEVICE_LIB_PATH=/opt/rocm_sdk_612/amdgcn/bitcode
ROCBLAS_HOME=/opt/rocm_sdk_612/rocblas
INSTALL_DIR_PREFIX_HIPCC=/opt/rocm_sdk_612
SDK_CXX_COMPILER_DEFAULT=/opt/rocm_sdk_612/bin/hipcc
ROCM_TARGET_TRIPLED=x86_64-rocm-linux-gnu
DISPLAY=localhost:1.0
SHLVL=2
MAX_JOBS=8
ROCM_SDK_BUILDER_SRC_REV=a6e83969
BUILD_CPU_COUNT_SAFE=8
BINFO_BUILD_CPU_COUNT=8
XDG_SESSION_ID=37
HIP_PATH_DEFAULT=/opt/rocm_sdk_612
ROCM_SDK_VERSION_INFO=rocm-6.1.2
ROCM_PATH=/opt/rocm_sdk_612
LD_LIBRARY_PATH=/opt/rocm_sdk_612/hcc/lib:/opt/rocm_sdk_612/rocblas/lib:/opt/rocm_sdk_612/hsa/lib:/opt/rocm_sdk_612/lib:/opt/rocm_sdk_612/lib64:/lib64:/opt/rocm_sdk_612/lib64:/opt/rocm_sdk_612/lib:/opt/rocm_sdk_612/hsa/lib
PATCH_FILE_ROOT_DIR=/home/cb88/rocm_sdk_builder/patches/rocm-6.1.2
XDG_RUNTIME_DIR=/run/user/1000
HCC_PATH=/opt/rocm_sdk_612/hcc/bin
PYENV_ROOT=/home/cb88/.pyenv
DEBUGINFOD_URLS=https://debuginfod.archlinux.org
BUILD_SCRIPT_ROOT_DIR=/home/cb88/rocm_sdk_builder/build
SDK_SRC_ROOT_DIR=/home/cb88/rocm_sdk_builder/src_projects
CPACK_RPM_PACKAGE_RELEASE=01
SDK_CXX_COMPILER_HIPCC=/opt/rocm_sdk_612/bin/hipcc
BUILD_ROOT_DIR=/home/cb88/rocm_sdk_builder/builddir
XDG_DATA_DIRS=/home/cb88/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
SDK_C_COMPILER_HIP_CLANG=/opt/rocm_sdk_612/bin/clang
PATH=/opt/rocm_sdk_612/bin:/opt/rocm_sdk_612/hcc/bin:/opt/rocm_sdk_612/bin:/home/cb88/.pyenv/shims:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/opt/rocm/bin:/usr/lib/rustup/bin
CFLAGS=-I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include
SDK_PLATFORM_NAME_HIPCC=amd
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
MAIL=/var/spool/mail/cb88
SSH_TTY=/dev/pts/1
BUILD_CPU_COUNT_MODERATE=8
INSTALL_DIR_PREFIX_HIP_CLANG=/opt/rocm_sdk_612
ROCM_SDK_RELEASE_VERSION=1
ROCM_VERSION_STR=6.1.2
APP_CMAKE_CFG_FLAGS_DEBUG=-DCMAKE_C_FLAGS_DEBUG=-g3 -DCMAKE_CXX_FLAGS_DEBUG=-g3
OLDPWD=/home/cb88/rocm_sdk_builder/builddir/033_02_composable_kernel_jit
BUILD_RULE_ROOT_DIR=/home/cb88/rocm_sdk_builder/binfo
_=/usr/bin/env

/home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX
/home/cb88/rocm_sdk_builder/builddir/035_AMDMIGraphX
[78] Configuration: AMDMIGraphX
BINFO_APP_CMAKE_CFG: -DCMAKE_INSTALL_PREFIX=/opt/rocm_sdk_612 -DCMAKE_PREFIX_PATH=/opt/rocm_sdk_612/lib64/cmake;/opt/rocm_sdk_612/lib/cmake -DGPU_TARGETS=gfx906 -DHALF_INCLUDE_DIR=/opt/rocm_sdk_612/include -DCMAKE_INCLUDE_PATH=/opt/rocm_sdk_612/include -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang++ -DMIGRAPHX_USE_HIPRTC=ON -DMIGRAPHX_ENABLE_PYTHON=ON -DMIGRAPHX_ENABLE_GPU=ON /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
BINFO_APP_CMAKE_CFG: -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib64 -DCMAKE_INSTALL_PREFIX=/opt/rocm_sdk_612 -DCMAKE_PREFIX_PATH=/opt/rocm_sdk_612/lib64/cmake;/opt/rocm_sdk_612/lib/cmake -DGPU_TARGETS=gfx906 -DHALF_INCLUDE_DIR=/opt/rocm_sdk_612/include -DCMAKE_INCLUDE_PATH=/opt/rocm_sdk_612/include -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang++ -DMIGRAPHX_USE_HIPRTC=ON -DMIGRAPHX_ENABLE_PYTHON=ON -DMIGRAPHX_ENABLE_GPU=ON /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
Configuring AMDMIGraphX
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_LIBDIR=lib64 -DCMAKE_INSTALL_PREFIX=/opt/rocm_sdk_612 -DCMAKE_PREFIX_PATH=/opt/rocm_sdk_612/lib64/cmake;/opt/rocm_sdk_612/lib/cmake -DGPU_TARGETS=gfx906 -DHALF_INCLUDE_DIR=/opt/rocm_sdk_612/include -DCMAKE_INCLUDE_PATH=/opt/rocm_sdk_612/include -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm_sdk_612/bin/clang++ -DMIGRAPHX_USE_HIPRTC=ON -DMIGRAPHX_ENABLE_PYTHON=ON -DMIGRAPHX_ENABLE_GPU=ON /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX
-- half.hpp is at /opt/rocm_sdk_612/include
-- Enable gpu backend
-- Clang tidy found: 18.0.0git
-- Clang tidy checks: boost-*,bugprone-*,cert-*,clang-analyzer-*,clang-diagnostic-*,cppcoreguidelines-*,google-*,hicpp-multiway-paths-covered,hicpp-signed-bitwise,llvm-namespace-comment,misc-*,-misc-confusable-identifiers,-misc-use-anonymous-namespace,modernize-*,performance-*,readability-*,-bugprone-easily-swappable-parameters,-bugprone-implicit-widening-of-multiplication-result,-bugprone-macro-parentheses,-bugprone-signed-char-misuse,-bugprone-unchecked-optional-access,-cert-dcl37-c,-cert-dcl51-cpp,-cert-err33-c,-cert-str34-c,-clang-analyzer-alpha*,clang-analyzer-alpha.core.CallAndMessageUnInitRefArg,clang-analyzer-alpha.core.Conversion,clang-analyzer-alpha.core.IdenticalExpr,clang-analyzer-alpha.core.PointerArithm,clang-analyzer-alpha.core.PointerSub,clang-analyzer-alpha.core.TestAfterDivZero,clang-analyzer-alpha.cplusplus.InvalidIterator,clang-analyzer-alpha.cplusplus.IteratorRange,clang-analyzer-alpha.cplusplus.MismatchedIterator,clang-analyzer-alpha.cplusplus.MisusedMovedObject,-clang-analyzer-optin.performance.Padding,-clang-diagnostic-deprecated-declarations,-clang-diagnostic-extern-c-compat,-clang-diagnostic-disabled-macro-expansion,-clang-diagnostic-unused-command-line-argument,-cppcoreguidelines-avoid-do-while,-cppcoreguidelines-avoid-const-or-ref-data-members,-cppcoreguidelines-explicit-virtual-functions,-cppcoreguidelines-init-variables,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-constant-array-index,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-member-init,-cppcoreguidelines-pro-type-reinterpret-cast,-cppcoreguidelines-pro-type-union-access,-cppcoreguidelines-pro-type-vararg,-cppcoreguidelines-special-member-functions,-cppcoreguidelines-virtual-class-destructor,-cppcoreguidelines-avoid-capture-default-when-capturing-this,-cppcoreguidelines-rvalue-reference-param-not-moved,-google-readability-*,-google-runtime-int,-google-runtime-references,-misc-macro-parentheses,-misc-no-recursion,-modernize-concat-nested-namespaces,-modernize-pass-by-value,-modernize-use-default-member-init,-modernize-use-nodiscard,-modernize-use-override,-modernize-use-trailing-return-type,-modernize-use-transparent-functors,-performance-type-promotion-in-math-fn,-readability-braces-around-statements,-readability-convert-member-functions-to-static,-readability-else-after-return,-readability-function-cognitive-complexity,-readability-identifier-length,-readability-named-parameter,-readability-redundant-string-init,-readability-suspicious-call-argument,-readability-uppercase-literal-suffix,-*-avoid-c-arrays,-*-explicit-constructor,-*-magic-numbers,-*-narrowing-conversions,-*-non-private-member-variables-in-classes,-*-use-auto,-*-use-emplace,-*-use-equals-default
-- Cppcheck found: 2.16.0
-- Parallel STL disabled
CMake Warning (dev) at /usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package):
  Policy CMP0167 is not set: The FindBoost module is removed.  Run "cmake
  --help-policy CMP0167" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

Call Stack (most recent call first):
  /usr/lib/cmake/msgpack-cxx/msgpack-cxx-config.cmake:40 (find_dependency)
  src/CMakeLists.txt:305 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found pybind11: /usr/include (found version "2.13.6")
-- Python 3.5 not found.
-- Python 3.6 not found.
-- Python 3.7 not found.
-- Python 3.8 not found.
-- Python 3.9 not found.
pyenv: python3.10-config: command not found

The `python3.10-config' command exists in these Python versions:
  3.10.15

Note: See 'pyenv help global' for tips on allowing both
      python2 and python3 to be found.
CMake Error at cmake/PythonModules.cmake:31 (message):
  Process failed:
  COMMAND;/home/cb88/.pyenv/shims/python3.10-config;--includes;OUTPUT_VARIABLE;_python_include_args
Call Stack (most recent call first):
  cmake/PythonModules.cmake:40 (py_exec)
  cmake/PythonModules.cmake:97 (find_python)
  src/py/CMakeLists.txt:31 (include)


-- Configuring incomplete, errors occurred!
configure failed: AMDMIGraphX

Note python3.10-config does exist not sure why this fails?

cb88 avatar Feb 03 '25 20:02 cb88

Tried building vllm at this point.

Note I did install setuptools_scm package via pacman -S python-setuptools-scm

[cb88@M31-AR0 rocm_sdk_builder]$ ./babs.sh -b binfo/extra/ai_tools.blist
ROCM_TARGET_TRIPLED: x86_64-rocm-linux-gnu
ROCM_PYTHON_VERSION: v3.11.11
INSTALL_DIR_PREFIX_SDK_ROOT: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_SDK_AI_MODELS: /opt/rocm_sdk_models
selected GPUs: gfx906
build
SDK_CXX_COMPILER_DEFAULT: /opt/rocm_sdk_612/bin/hipcc
HIP_PLATFORM_DEFAULT: amd
HIP_PLATFORM: hcc
HIP_PATH: /opt/rocm_sdk_612
SDK_ROOT_DIR: /home/cb88/rocm_sdk_builder
SDK_SRC_ROOT_DIR: /home/cb88/rocm_sdk_builder/src_projects
BUILD_RULE_ROOT_DIR: /home/cb88/rocm_sdk_builder/binfo
PATCH_FILE_ROOT_DIR: /home/cb88/rocm_sdk_builder/patches/rocm-6.1.2
BUILD_ROOT_DIR: /home/cb88/rocm_sdk_builder/builddir
INSTALL_DIR_PREFIX_SDK_ROOT: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_HIPCC: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_HIP_CLANG: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_C_COMPILER: /opt/rocm_sdk_612
INSTALL_DIR_PREFIX_HIP_LLVM: /opt/rocm_sdk_612
SPACE_SEPARATED_GPU_TARGET_LIST_DEFAULT: gfx906
SEMICOLON_SEPARATED_GPU_TARGET_LIST_DEFAULT: gfx906
LF_SEPARATED_GPU_TARGET_LIST_DEFAULT: gfx906
HIP_PATH_DEFAULT: /opt/rocm_sdk_612
SDK_ROOT_DIR: /home/cb88/rocm_sdk_builder
APP_INFO_FULL_NAME: binfo/extra/ai_tools_dependencies.binfo

---------------------------
[1] BINFO_APP_NAME: pytorch_dependencies
BINFO FILE: binfo/extra/ai_tools_dependencies.binfo
BINFO_APP_SRC_SUBDIR_BASENAME:
BINFO_APP_SRC_TOPDIR_BASENAME: pytorch_dependencies
BINFO_APP_SRC_DIR: /home/cb88/rocm_sdk_builder/src_projects/pytorch_dependencies
BINFO_APP_SRC_CLONE_DIR: /home/cb88/rocm_sdk_builder/src_projects/pytorch_dependencies
BINFO_APP_BUILD_DIR: /home/cb88/rocm_sdk_builder/builddir/ai_tools_dependencies
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/cb88/rocm_sdk_builder/builddir/ai_tools_dependencies/.result_install
---------------------------

APP_INFO_FULL_NAME: binfo/extra/vllm.binfo

---------------------------
[2] BINFO_APP_NAME: vllm
BINFO FILE: binfo/extra/vllm.binfo
BINFO_APP_SRC_SUBDIR_BASENAME:
BINFO_APP_SRC_TOPDIR_BASENAME: vllm
BINFO_APP_SRC_DIR: /home/cb88/rocm_sdk_builder/src_projects/vllm
BINFO_APP_SRC_CLONE_DIR: /home/cb88/rocm_sdk_builder/src_projects/vllm
BINFO_APP_BUILD_DIR: /home/cb88/rocm_sdk_builder/builddir/vllm
HIP_PATH: /opt/rocm_sdk_612
INSTALL_DIR: /opt/rocm_sdk_612
HIP_PLATFORM: amd
TASK_RESULT_FILE_INSTALL: /home/cb88/rocm_sdk_builder/builddir/vllm/.result_install
---------------------------


/home/cb88/rocm_sdk_builder/builddir/vllm
[2] Building: vllm
[0] vllm, build command:
cd /home/cb88/rocm_sdk_builder/src_projects/vllm
[1] vllm, build command:
./build_rocm.sh /opt/rocm_sdk_612 gfx906
using rocm_root_directory specified: /opt/rocm_sdk_612
Using specified amd rocm gpu: gfx906
Traceback (most recent call last):
  File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 16, in <module>
    from setuptools_scm import get_version
ModuleNotFoundError: No module named 'setuptools_scm'
build failed: vllm
  error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 gfx906
[cb88@M31-AR0 rocm_sdk_builder]$

cb88 avatar Feb 03 '25 20:02 cb88

you could probably do the

source /opt/rocm_sdk_612/bin/env_rocm.sh pip install setuptools_scm

But vllm may require the pytorch. I still have not had time to setup the arch linux. Have kind of storage problem as I am now doing 3 docker builds also. (one for cdna, one for rnda1/2 and one for rdna3 cards)

lamikr avatar Feb 04 '25 00:02 lamikr

I was able to build pytorch acutally, but not torch vision or audio.

I also built the older tensorflow verison

Went back and tried to build vllm again (after running pip isntlal setuptools_scm)

and got this /home/cb88/rocm_sdk_builder/builddir/vllm [2] Building: vllm [0] vllm, build command: cd /home/cb88/rocm_sdk_builder/src_projects/vllm [1] vllm, build command: ./build_rocm.sh /opt/rocm_sdk_612 gfx906 using rocm_root_directory specified: /opt/rocm_sdk_612 Using specified amd rocm gpu: gfx906 No ROCm runtime is found, using ROCM_HOME='/opt/rocm_sdk_612' Traceback (most recent call last): File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 631, in version=get_vllm_version(), ^^^^^^^^^^^^^^^^^^ File "/home/cb88/rocm_sdk_builder/src_projects/vllm/setup.py", line 525, in get_vllm_version raise RuntimeError("Unknown runtime environment") RuntimeError: Unknown runtime environment build failed: vllm error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 gfx906

cb88 avatar Feb 04 '25 03:02 cb88

Having another go at building this on Arch.

llvm build failed to detect pfmlib.h

I installed libpfm via pacman and it seems to confinue building.

cb88 avatar Feb 18 '25 20:02 cb88

Fails here now.

CCLD     libucs.la
/opt/rocm_sdk_612/bin/ld: /opt/rocm_sdk_612/lib/libbfd.a(elf64.o): warning: relocation against `bfd_elf64_swap_reloca_out' in read-only section `.text'
/opt/rocm_sdk_612/bin/ld: /opt/rocm_sdk_612/lib/libbfd.a(bfd.o): relocation R_X86_64_PC32 against symbol `_bfd_error_buf' can not be used when making a shared object; recompile with -fPIC
/opt/rocm_sdk_612/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make[3]: *** [Makefile:1290: libucs.la] Error 1
make[3]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/015_01_ucx_openmpi/src/ucs'
make[2]: *** [Makefile:2086: all-recursive] Error 1
make[2]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/015_01_ucx_openmpi/src/ucs'
make[1]: *** [Makefile:802: all-recursive] Error 1
make[1]: Leaving directory '/home/cb88/rocm_sdk_builder/builddir/015_01_ucx_openmpi'
make: *** [Makefile:664: all] Error 2
build failed: ucx

cb88 avatar Feb 18 '25 20:02 cb88