was there a bad update to onnxruntime?
Hi there, having some problems. Built this project a couple weeks ago and everything went great, even compiled https://github.com/likelovewant/ollama-for-amd against it and loved the results on my 6900hx , gfx1035. Then I did an os reinstall. When I downloaded the repo again I can not get it to build past the onnx runtime package, 040. First it didnt have TARGET_GPUS set, so I ran babs with the -c flag a couple times but no dice. Eventually set it manually. That got a little farther, but triggers this error:
[ 16%] Building CXX object _deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o In file included from /home/brock/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release/_deps/googletest-src/googletest/include/gtest/gtest-assertion-result.h:46, from /home/brock/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release/_deps/googletest-src/googletest/include/gtest/gtest.h:63, from /home/brock/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release/_deps/googletest-src/googletest/src/gtest-all.cc:38: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release/_deps/googletest-src/googletest/include/gtest/gtest-message.h:62:10: fatal error: absl/strings/has_absl_stringify.h: No such file or directory 62 | #include "absl/strings/has_absl_stringify.h" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. gmake[2]: *** [_deps/googletest-build/googletest/CMakeFiles/gtest.dir/build.make:76: _deps/googletest-build/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:4470: _deps/googletest-build/googletest/CMakeFiles/gtest.dir/all] Error 2 gmake[1]: *** Waiting for unfinished jobs....
before eventually spiraling down to the same issues it had with the no gpu target set, dozens/hundreds of errors like:
[ 24%] Building CXX object _deps/protobuf-build/CMakeFiles/libprotoc.dir/src/google/protobuf/compiler/java/primitive_field.cc.o
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:210: unsupported identifier "__NV_SATFINITE"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:210: unsupported device function "__nv_cvt_halfraw_to_fp8": return T(static_cast
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:216: unsupported identifier "__NV_NOSAT"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:216: unsupported device function "__nv_cvt_halfraw_to_fp8": return T(static_cast
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:222: unsupported identifier "__NV_SATFINITE"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:222: unsupported device function "__nv_cvt_float_to_fp8": return T(static_cast
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:228: unsupported identifier "__NV_NOSAT"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:228: unsupported device function "__nv_cvt_float_to_fp8": return T(static_cast
[ 24%] Building CXX object _deps/protobuf-build/CMakeFiles/libprotoc.dir/src/google/protobuf/compiler/java/primitive_field_lite.cc.o
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:264: unsupported identifier "__NV_E4M3"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/math/unary_elementwise_ops_impl.cu:265: unsupported identifier "__NV_E5M2"
[ 25%] Building HIP object _deps/composable_kernel-build/library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_comp_fp8_km_kn_mn_instance.cpp.o
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:31: unsupported identifier "__NV_E4M3"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:31: unsupported device function "__nv_cvt_fp8_to_halfraw": return __half2float(__nv_cvt_fp8_to_halfraw(v.val, __NV_E4M3));
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:38: unsupported identifier "__NV_E4M3"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:38: unsupported device function "__nv_cvt_fp8_to_halfraw": return __nv_cvt_fp8_to_halfraw(v.val, __NV_E4M3);
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:45: unsupported identifier "__NV_E5M2"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:45: unsupported device function "__nv_cvt_fp8_to_halfraw": return __half2float(__nv_cvt_fp8_to_halfraw(v.val, __NV_E5M2));
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:52: unsupported identifier "__NV_E5M2"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:52: unsupported device function "__nv_cvt_fp8_to_halfraw": return __nv_cvt_fp8_to_halfraw(v.val, __NV_E5M2);
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:59: unsupported identifier "__NV_SATFINITE"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:59: unsupported identifier "__NV_NOSAT"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:59: unsupported identifier "__NV_E4M3"
warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/providers/cuda/tensor/cast_op.cu:59: unsupported device function "__nv_cvt_float_to_fp8": return Float8E4M3FN(static_cast
[ 30%] Hipify: onnxruntime/contrib_ops/cuda/bert/flash_attention/utils.h -> amdgpu/onnxruntime/contrib_ops/rocm/bert/flash_attention/utils.h warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/contrib_ops/cuda/bert/flash_attention/utils.h:125: unsupported device function "__shfl_xor_sync": x = op(x, __shfl_xor_sync(uint32_t(-1), x, OFFSET)); warning: /home/brock/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/contrib_ops/cuda/bert/flash_attention/utils.h:136: unsupported device function "__shfl_xor_sync": x = op(x, __shfl_xor_sync(uint32_t(-1), x, 1));
and failing with:
[ 41%] Built target device_gemm_instance
gmake: *** [Makefile:146: all] Error 2
Traceback (most recent call last):
File "/home/brock/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 2955, in
Onnxruntime and deepspeed are the last two packages of the list of "core" apps build by default but things like pytorch should already be working.
One thing you could also try is to remove the onnxruntime folder and it's build directory and then trying to rebuild onxxruntime again. This way we could verify that there is no any old pytorch build files that causes your problem.
$ rm -rf builddir/040_02_onnxruntime_deepspeed src_projects/onnxruntime/
$ ./babs.sh -b
If this does not work, can you check
- Are examples in /opt/rocm_sdk_612/docs/examples/pytorch/ working for you?
- When you rebuild everything, did you also remove /opt/rocm_sdk_612 which contained the old build?
- Hmm, could it be possible that you build the rest of the system by accident by using the wrong GPU as a target?
ok, so after doing a full wipe, pulling down the repo again, building everything, same issue. Let me go down your list, and then share some specifics
-
yes. They work fine. The pytorch benchmark shows 27 seconds on cpu, 0.4 seconds on GPU
-
Yes. Nuked it from orbit. And any files in home
-
no. Ive now built (or tried to) 3 times trying to resolve.
Heres what im seeing. At the start of the onnx package, it correctly identifies gfx1035:
/home/brock/rocm_sdk_builder/builddir/040_01_onnxruntime_rocm_training
[89] Post-configuration: onnxruntime
no post-configuration commands
post-configuration ok: onnxruntime
/home/brock/rocm_sdk_builder/builddir/040_01_onnxruntime_rocm_training
[89] Building: onnxruntime
[0] onnxruntime, build command:
cd /home/brock/rocm_sdk_builder/src_projects/onnxruntime
[1] onnxruntime, build command:
./build_rocm.sh /opt/rocm_sdk_612 gfx1035
using rocm_root_directory specified: /opt/rocm_sdk_612
Using specified amd rocm gpu: gfx1035
Linux distributions cmake version ok
3.28.3 >= 3.26.1
Linux distributions cmake version ok
3.28.3 >= 3.26.1
2025-03-15 20:30:41,792 build [DEBUG] - Command line arguments:
--build_dir /home/brock/rocm_sdk_builder/src_projects/onnxruntime/build/Linux --allow_running_as_root --config Release --enable_training --build_wheel --parallel --skip_tests --build_shared_lib --use_rocm --rocm_home /opt/rocm_sdk_612 --use_migraphx --migraphx_home /opt/rocm_sdk_612 --cmake_extra_defines CMAKE_HIP_COMPILER=/opt/rocm_sdk_612/bin/clang++ CMAKE_INSTALL_PREFIX=/opt/rocm_sdk_612 'CMAKE_HIP_ARCHITECTURES=gfx1035
The first error it runs into is this:
-- The CXX compiler identification is GNU 13.3.0
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning (dev) at CMakeLists.txt:55 (include):
Policy CMP0145 is not set: The Dart and FindDart modules are removed. Run
"cmake --help-policy CMP0145" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at /usr/share/cmake-3.28/Modules/Dart.cmake:47 (message):
Policy CMP0145 is not set: The Dart and FindDart modules are removed. Run
"cmake --help-policy CMP0145" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
Call Stack (most recent call first):
CMakeLists.txt:55 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- The HIP compiler identification is Clang 17.0.0
Followed shortly by this, but still identified as 1035:
CMAKE_HIP_COMPILER: /opt/rocm_sdk_612/bin/clang++
CMAKE_HIP_ARCHITECTURES: gfx1035
CMAKE_HIP_FLAGS:
CMAKE_HIP_FLAGS_RELEASE: -O3 -DNDEBUG
CMake Warning at CMakeLists.txt:442 (message):
onnxruntime_ENABLE_TRAINING_TORCH_INTEROP is turned OFF due to incompatible
build combinations.
CMake Warning at CMakeLists.txt:449 (message):
onnxruntime_ENABLE_TRITON is turned OFF because it's designed to support
CUDA training on Linux only currently.
-- Performing Test COMPILER_SUPPORT_MF16C
-- Performing Test COMPILER_SUPPORT_MF16C - Success
a bit later theres another of the policy messages:
patching file CMakeLists.txt
patching file onnx/common/file_utils.h
patching file onnx/defs/quantization/defs.cc
patching file onnx/defs/quantization/old.cc
patching file onnx/onnx_pb.h
patching file onnx/shape_inference/implementation.cc
[ 55%] No configure step for 'onnx-populate'
[ 66%] No build step for 'onnx-populate'
[ 77%] No install step for 'onnx-populate'
[ 88%] No test step for 'onnx-populate'
[100%] Completed 'onnx-populate'
[100%] Built target onnx-populate
CMake Deprecation Warning at /home/brock/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release/_deps/onnx-src/CMakeLists.txt:2 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
CMake Warning (dev) at /home/brock/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release/_deps/onnx-src/CMakeLists.txt:107 (find_package):
Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
are removed. Run "cmake --help-policy CMP0148" for policy details. Use
the cmake_policy command to set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
Then more:
[100%] Completed 'tensorboard-populate'
[100%] Built target tensorboard-populate
CMake Warning at CMakeLists.txt:1605 (message):
MPI and NCCL are disabled because build is on Windows or USE_NCCL is set to
OFF.
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Python Build is enabled
CMake Warning (dev) at onnxruntime_providers_migraphx.cmake:25 (find_package):
Policy CMP0144 is not set: find_package uses upper-case <PACKAGENAME>_ROOT
variables. Run "cmake --help-policy CMP0144" for policy details. Use the
cmake_policy command to set the policy and suppress this warning.
CMake variable MIGRAPHX_ROOT is set to:
/opt/rocm_sdk_612
For compatibility, find_package is ignoring the variable, but code in a
.cmake module might still use it.
Call Stack (most recent call first):
onnxruntime_providers.cmake:172 (include)
CMakeLists.txt:1744 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Looking for migraphx_program_run_async in migraphx::c
-- Looking for migraphx_program_run_async in migraphx::c - found
-- MIGRAPHX GPU STREAM SYNC is ENABLED
CMake Warning (dev) at onnxruntime_rocm_hipify.cmake:170:
Syntax Warning in cmake code at column 26
Argument not separated from preceding token by whitespace.
Call Stack (most recent call first):
onnxruntime_providers_rocm.cmake:5 (include)
onnxruntime_providers.cmake:184 (include)
CMakeLists.txt:1744 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at onnxruntime_rocm_hipify.cmake:171:
Syntax Warning in cmake code at column 25
Argument not separated from preceding token by whitespace.
Call Stack (most recent call first):
onnxruntime_providers_rocm.cmake:5 (include)
onnxruntime_providers.cmake:184 (include)
CMakeLists.txt:1744 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found Python3: /opt/rocm_sdk_612/bin/python3 (found version "3.11.11") found components: Interpreter
And at this point it no longer sees gpu target:
GPU_TARGETS=
checking which targets are supported
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1010
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1010 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1030
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1030 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1031
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1031 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1032
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1032 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1035
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1035 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1036
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1036 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102 - Failed
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1103
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1103 - Failed
Supported GPU_TARGETS=
I did one update to onnxruntime, can you test again by running:
./babs.sh -up
./babs.sh -b
I am having
/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/optimizer/selectors_actions/selector_action_transformer.cc: In function ‘onnxruntime::common::Status onnxruntime::MatchAndProcess(Graph&, const GraphViewer&, Node&, bool&, const logging::Logger&, const std::string&, const SelectorActionRegistry&, const SatRuntimeOptimizationSaveContext*)’:
/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/optimizer/selectors_actions/selector_action_transformer.cc:150:23: error: loop variable ‘op_schema’ creates a copy from type ‘const gsl::not_null<const onnx::OpSchema*>’ [-Werror=range-loop-construct]
150 | for (const auto op_schema : action_saved_state.produced_node_op_schemas) {
| ^~~~~~~~~
/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/optimizer/selectors_actions/selector_action_transformer.cc:150:23: note: use reference type to prevent copying
150 | for (const auto op_schema : action_saved_state.produced_node_op_schemas) {
| ^~~~~~~~~
|
and
n/pool.cc.o
/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/session/inference_session.cc: In member function ‘onnxruntime::common::Status onnxruntime::InferenceSession::SaveToOrtFormat(const onnxruntime::PathString&) const’:
/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/session/inference_session.cc:852:19: error: loop variable ‘op_schema’ creates a copy from type ‘const gsl::not_null<const onnx::OpSchema*>’ [-Werror=range-loop-construct]
852 | for (const auto op_schema : saved_runtime_optimization_produced_node_op_schemas_) {
| ^~~~~~~~~
/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/session/inference_session.cc:852:19: note: use reference type to prevent copying
852 | for (const auto op_schema : saved_runtime_optimization_produced_node_op_schemas_) {
| ^~~~~~~~~
| &
cc1plus: all warnings being treated as errors
gmake[2]: *** [CMakeFiles/onnxruntime_session.dir/build.make:177: CMakeFiles/onnxruntime_session.dir/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/onnxruntime/core/session/inference_session.cc.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
error while tryng to build for gfx1150
last lines
gmake: *** [Makefile:146: all] Error 2
Traceback (most recent call last):
File "/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 2955, in <module>
sys.exit(main())
^^^^^^
File "/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 2847, in main
build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
File "/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 1736, in build_targets
run_subprocess(cmd_args, env=env)
File "/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/tools/ci_build/build.py", line 861, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/tools/python/util/run.py", line 49, in run
completed_process = subprocess.run(
^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/home/yoni/Downloads/hashcat-6.2.6/rocm_sdk_builder/src_projects/onnxruntime/build/Linux/Release', '--config', 'Release', '--', '-j20']' returned non-zero exit status 2.
build failed: onnxruntime
error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 gfx1150
is this possibly related?