Build failing on ubuntu 24.04 with 7800xt
I've been trying to build both the 6.2.1 and 6.3.3 versions of the sdk, but they are failing for different reasons ~10 hours into the build. It looks like Tensile isn't getting installed right with 6.2.1, and 6.3.3 can't find hiprtc-targets.cmake. I'd greatly appreciate any direction or advice. I'm desperate to get my 7800xt working.
Here's the error with 6.2.1:
Successfully installed Tensile-4.40.0 msgpack-1.1.0
[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python3 -m pip install --upgrade pip
-- using local Tensile from /mnt/linux/rocm_sdk_builder/src_projects/Tensile, copied to
-- Adding /mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/virtualenv to CMAKE_PREFIX_PATH
-- Using AMDGPU_TARGETS: gfx1101
-- Tensile script: /mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/virtualenv/lib/python3.11/site-packages/Tensile/bin/TensileCreateLibrary
-- Tensile_CREATE_COMMAND: /mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/virtualenv/lib/python3.11/site-packages/Tensile/bin/TensileCreateLibrary;--merge-files;--separate-architectures;--lazy-library-loading;--no-short-file-names;--no-library-print-debug;--code-object-version=default;--cxx-compiler=hipcc;--jobs=6;--library-format=msgpack;--architecture=gfx1101;/mnt/linux/rocm_sdk_builder/src_projects/rocBLAS/library/src/blas3/Tensile/Logic/asm_full;/mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/Tensile;HIP
-- Tensile_MANIFEST_FILE_PATH: /mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/Tensile/library/TensileManifest.txt
'/mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/virtualenv/lib/python3.11/site-packages/Tensile/bin/TensileCreateLibrary' '--merge-files' '--separate-architectures' '--lazy-library-loading' '--no-short-file-names' '--no-library-print-debug' '--code-object-version=default' '--cxx-compiler=hipcc' '--jobs=6' '--library-format=msgpack' '--architecture=gfx1101' '/mnt/linux/rocm_sdk_builder/src_projects/rocBLAS/library/src/blas3/Tensile/Logic/asm_full' '/mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/Tensile' 'HIP' '--generate-manifest-and-exit'
Traceback (most recent call last):
File "/mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/virtualenv/lib/python3.11/site-packages/Tensile/bin/TensileCreateLibrary", line 30, in <module>
from Tensile.TensileCreateLibrary import TensileCreateLibrary
ModuleNotFoundError: No module named 'Tensile.TensileCreateLibrary'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/virtualenv/lib/python3.11/site-packages/Tensile/bin/TensileCreateLibrary", line 37, in <module>
from Tensile.TensileCreateLibrary import TensileCreateLibrary
ModuleNotFoundError: No module named 'Tensile.TensileCreateLibrary'
CMake Error at /mnt/linux/rocm_sdk_builder/builddir/023_02_rocBLAS/virtualenv/cmake/TensileConfig.cmake:277 (message):
Error creating Tensile library: 1
Call Stack (most recent call first):
library/src/CMakeLists.txt:74 (TensileCreateLibraryFiles)
EDIT 1: "unset PYTHONPATH" was all I needed to get this going again.
Trying the 6.3.3 build:
CMake Error at /opt/rocm_sdk_633/lib/cmake/hiprtc/hiprtc-config.cmake:63 (include):
include could not find requested file:
/opt/rocm_sdk_633/lib/cmake/hiprtc/hiprtc-targets.cmake
Call Stack (most recent call first):
cmake/public/LoadHIP.cmake:73 (find_package)
cmake/public/LoadHIP.cmake:160 (find_package_and_print_version)
aten/CMakeLists.txt:67 (include)
I did get 6.2.1 compiled and installed. My hope was that I'd find better performance in ComfyUI than current rocm from amd (forcing gfx1100), but so far its all a bit slower. I really just want a way for flash attention to work.
What did you needed to to resolve your problem with 6.1.2 build?
Git branch wip/rocm_sdk_builder_633 branch has now had lot of updates and I have been able to resolve now the iGPU problems I was seeing there. So far I have tested with the gfx1030/6800, gfx1035/680M, gfx1102/7700S, gfx1103/780M, gfx1150 and gfx1151. I will also do now a build for gfx906/MI50 and for gfx1010/5700 XT but have not tested those yet.
There is at least one OpenMI issue that I need to undestand better and figure out a best way to resolve. (If I run pytorch_gpu_benchmark, MIOpen starts to generate performance db to find optimal solutions. Problem is that it will itself take long time and I am not sure yet what is the best way to handle that.)
What did you needed to to resolve your problem with 6.1.2 build?
"unset PYTHONPATH" fixed it in my case. I can run comfyui, but it was slower than running it without the SDK.
I tried building the 633 branch yesterday, but it kept failing with "double free or corruption (!prev) torch." I'll give it another try tonight.
Thank you so much for your work on this!
EDIT: Regarding your OpenMI issue... its a long shot, but would the MIOPEN_FIND_MODE=FAST environmental variable help in this case?
Thanks for the suggestion.
In rocm 6.1.2 I was having a similar looking double free error at some point everytime when the pytoch application exited. I needed to create a patch for amdsmi to change a way for how all kind of messages were created to fix it. (Code used static strings and then somebody tried to call free for them) Now I am not included any patches to amdsmi in 6.3.3 version and have not seen any double free errors at least yet.
I can try to test the comfy-ui by myself to check how it behaves once I get this miopen problem solved. I tested it maybe 6 months ago but did not package it then. I think it would make now sense to include it. There are also couple of other packages that have been asked to be integrated. One of them was however requiring a python 3.12, so I need to test if we can get this also running with it.
One difference I just noticed is that the share/miopen files have not been installed on 6.3.3 build for now. For example this is missing at the moment in 6.3.3 build:
/opt/rocm_sdk_612/share/miopen/db/gfx1030_36.db and /opt/rocm_sdk_612/share/miopen/perf_models/
This is the benchmark logs that I am seeing with pytorch gpu benchmark on 6.3.3.
MIOPEN_FIND_MODE=FAST ./run_benchmarks.sh
...
Benchmark start time: 2025/03/30 19:41:53
Precision: half, model list: LARGE
Benchmarking training, precision: half, model: mnasnet0_5
MIOpen(HIP): Warning [GetAllConfigs] ConvBinWinogradRxSf3x2: Searching the best solution among 60...
MIOpen(HIP): Warning [GenericSearch] Done: 60/0/60, best #0 0.024032 54
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.33822 (default time 0.03216)
MIOpen(HIP): Warning [GetAllConfigs] ConvBinWinogradRxSf3x2: Searching the best solution among 60...
MIOpen(HIP): Warning [GenericSearch] Done: 60/0/60, best #0 0.023096 59
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.3249 (default time 0.0306)
MIOpen(HIP): Warning [GetAllConfigs] ConvBinWinogradRxSf3x2: Searching the best solution among 60...
MIOpen(HIP): Warning [GenericSearch] Done: 60/0/60, best #48 0.0865122 5
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.20307 (default time 0.10408)
MIOpen(HIP): Warning [GetAllConfigs] ConvBinWinogradRxSf2x3: Searching the best solution among 60...
MIOpen(HIP): Warning [GenericSearch] Done: 60/0/60, best #59 0.106132 5
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.09336 (default time 0.116041)
MIOpen(HIP): Warning [GetAllConfigs] ConvBinWinogradRxSf3x2: Searching the best solution among 60...
MIOpen(HIP): Warning [GenericSearch] Done: 60/0/60, best #0 0.0183161 23
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.19895 (default time 0.02196)
MIOpen(HIP): Warning [GetAllConfigs] ConvBinWinogradRxSf3x2: Searching the best solution among 60...
MIOpen(HIP): Warning [GenericSearch] Done: 60/0/60, best #0 0.0179001 57
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.39887 (default time 0.02504)
MIOpen(HIP): Warning [GetAllConfigs] ConvBinWinogradRxSf3x2: Searching the best solution among 60...
MIOpen(HIP): Warning [GenericSearch] Done: 60/0/60, best #4 0.163876 1
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.04909 (default time 0.171921)
MIOpen(HIP): Warning [GetAllConfigs] ConvBinWinogradRxSf2x3: Searching the best solution among 60...
MIOpen(HIP): Warning [GenericSearch] Done: 60/0/60, best #35 0.138848 5
MIOpen(HIP): Warning [GenericSearch] ...Score: 1.2768 (default time 0.177281)
MIOpen(HIP): Warning [SearchImpl] Searching the best solution in the 9 dim space. Please, be patient...
MIOpen(HIP): Warning [SearchImpl] Runs left: 863, min time so far: 0.06696, curr time: 0.06696 16,16,16,16,1,1,1,1,1
My 633 build is now failing at compiling SuiteSparse/GraphBLAS. Here's the relevant output:
In file included from /mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:573:10: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
573 | case ZSTD_c_targetCBlockSize:
| ^
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/common/../common/../zstd.h:1952:33: note: expanded from macro 'ZSTD_c_targetCBlockSize'
1952 | #define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
| ^
In file included from /mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:689:10: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
689 | case ZSTD_c_targetCBlockSize:
| ^
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/common/../common/../zstd.h:1952:33: note: expanded from macro 'ZSTD_c_targetCBlockSize'
1952 | #define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
| ^
In file included from /mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:748:10: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
748 | case ZSTD_c_targetCBlockSize:
| ^
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/common/../common/../zstd.h:1952:33: note: expanded from macro 'ZSTD_c_targetCBlockSize'
1952 | #define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
| ^
In file included from /mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:941:10: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
941 | case ZSTD_c_targetCBlockSize :
| ^
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/common/../common/../zstd.h:1952:33: note: expanded from macro 'ZSTD_c_targetCBlockSize'
1952 | #define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
| ^
In file included from /mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:943:24: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
943 | BOUNDCHECK(ZSTD_c_targetCBlockSize, value);
| ^
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/common/../common/../zstd.h:1952:33: note: expanded from macro 'ZSTD_c_targetCBlockSize'
1952 | #define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
| ^
In file included from /mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/Source/GB_zstd.c:63:
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/compress/zstd_compress.c:1114:10: error: use of undeclared identifier 'ZSTD_c_experimentalParam6'
1114 | case ZSTD_c_targetCBlockSize :
| ^
/mnt/linux/rocm_sdk_builder_633/src_projects/SuiteSparse/GraphBLAS/zstd/zstd_subset/common/../common/../zstd.h:1952:33: note: expanded from macro 'ZSTD_c_targetCBlockSize'
1952 | #define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
| ^
6 errors generated.
make[2]: *** [GraphBLAS/CMakeFiles/GraphBLAS.dir/build.make:41614: GraphBLAS/CMakeFiles/GraphBLAS.dir/Source/GB_zstd.c.o] Error 1
make[2]: Leaving directory '/mnt/linux/rocm_sdk_builder_633/builddir/023_06_SuiteSparse'
make[1]: *** [CMakeFiles/Makefile2:2901: GraphBLAS/CMakeFiles/GraphBLAS.dir/all] Error 2
make[1]: Leaving directory '/mnt/linux/rocm_sdk_builder_633/builddir/023_06_SuiteSparse'
make: *** [Makefile:146: all] Error 2
build failed: SuiteSparse
I think my previous issue was because the power went out during build, maybe during building zstd. It built it again and has now continued, since I ran:
./babs.sh -rs binfo/core/023_06_SuiteSparse.binfo ./babs.sh -b
After a few hours, it fails building torchvision with:
removing build/bdist.linux-x86_64/wheel
double free or corruption (!prev)
./build_rocm.sh: line 18: 1110357 Aborted (core dumped) ROCM_PATH=${install_dir_prefix_rocm} FORCE_CUDA=1 TORCHVISION_USE_NVJPEG=0 TORCHVISION_USE_VIDEO_CODEC=0 CC=${CMAKE_C_COMPILER} CXX=${CMAKE_CXX_COMPILER} python setup.py bdist_wheel
build failed: pytorch_vision
error in build cmd: ./build_rocm.sh /opt/rocm_sdk_633
I think my previous issue was because the power went out during build, maybe during building zstd. It built it again and has now continued, since I ran:
./babs.sh -rs binfo/core/023_06_SuiteSparse.binfo ./babs.sh -b
After a few hours, it fails building torchvision with:
removing build/bdist.linux-x86_64/wheel double free or corruption (!prev) ./build_rocm.sh: line 18: 1110357 Aborted (core dumped) ROCM_PATH=${install_dir_prefix_rocm} FORCE_CUDA=1 TORCHVISION_USE_NVJPEG=0 TORCHVISION_USE_VIDEO_CODEC=0 CC=${CMAKE_C_COMPILER} CXX=${CMAKE_CXX_COMPILER} python setup.py bdist_wheel build failed: pytorch_vision error in build cmd: ./build_rocm.sh /opt/rocm_sdk_633
line 18 is forcing the install for CUDA... ROCM_PATH=${install_dir_prefix_rocm} FORCE_CUDA=1 TORCHVISION_USE_NVJPEG=0 TORCHVISION_USE_VIDEO_CODEC=0 CC=${CMAKE_C_COMPILER} CXX=${CMAKE_CXX_COMPILER} python setup.py bdist_wheel
line 18 is forcing the install for CUDA... ROCM_PATH=${install_dir_prefix_rocm} FORCE_CUDA=1 >TORCHVISION_USE_NVJPEG=0 TORCHVISION_USE_VIDEO_CODEC=0 CC=${CMAKE_C_COMPILER} >CXX=${CMAKE_CXX_COMPILER} python setup.py bdist_wheel
I tried changing that on a fresh build, just now, and it still fails the same way:
adding 'torchvision-0.22.0+781660c.dist-info/licenses/LICENSE'
adding 'torchvision-0.22.0+781660c.dist-info/METADATA'
adding 'torchvision-0.22.0+781660c.dist-info/WHEEL'
adding 'torchvision-0.22.0+781660c.dist-info/top_level.txt'
adding 'torchvision-0.22.0+781660c.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
double free or corruption (!prev)
./build_rocm.sh: line 18: 4246 Aborted (core dumped) ROCM_PATH=${install_dir_prefix_rocm} FORCE_CUDA=0 USE_ROCM=1 USE_CUDA=0 TORCHVISION_USE_NVJPEG=0 TORCHVISION_USE_VIDEO_CODEC=0 CC=${CMAKE_C_COMPILER} CXX=${CMAKE_CXX_COMPILER} python setup.py bdist_wheel
build failed: pytorch_vision
error in build cmd: ./build_rocm.sh /opt/rocm_sdk_633
I get the impression this has to do with how pytorch is built. This same error happens when i try to manually build torch audio, as well.