xla icon indicating copy to clipboard operation
xla copied to clipboard

XLA_TARGET=rocm compilation failed with "crosstool_wrapper_driver_is_not_gcc failed"

Open Awlexus opened this issue 1 year ago • 7 comments

Hi, I've been trying to get GPU support running, but I keep running into this issue. I was first looking at this issue to get it running. I added the dependencies like this:

# mix.exs
      {:nx, github: "elixir-nx/nx", sparse: "nx", override: true},
      {:exla, github: "elixir-nx/nx", sparse: "exla", override: true}

I made sure to install the dependencies mentioned in this comment (adjusted for arch linux):

$ sudo pacman -S miopen-hip hipfft rocrand \
    hipsparse  hipsolver hipsparse rccl hip-runtime-amd \
    rocfft roctracer hipblas rocm-device-libs rocsolver rocblas

And then tried to compile it with $ XLA_BUILD=true XLA_TARGET=rocm mix compile

Compilation logs


==> xla Compiling 2 files (.ex) Generated xla app mkdir -p /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb &&
cd /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb &&
git init &&
git remote add origin https://github.com/openxla/xla.git &&
git fetch --depth 1 origin 771e38178340cbaaef8ff20f44da5407c15092cb &&
git checkout FETCH_HEAD &&
rm /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelversion Initialized empty Git repository in /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.git/ From https://github.com/openxla/xla

  • branch 771e38178340cbaaef8ff20f44da5407c15092cb -> FETCH_HEAD Note: switching to 'FETCH_HEAD'.

You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example:

git switch -c

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 771e381 [XLA:GPU] Check tensor_float_32_execution_enabled() in Triton codegen too rm -f /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/extension &&
ln -s "/hdd/programming/elixir/fusemega/deps/xla/extension" /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/extension &&
cd /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb &&
bazel build --define "framework_shared_object=false" -c opt --config=rocm --action_env=HIP_PLATFORM=hcc --action_env=TF_ROCM_AMDGPU_TARGETS="gfx900,gfx906,gfx908,gfx90a,gfx1030" //xla/extension:xla_extension &&
mkdir -p /home/awlex/.cache/xla/0.6.0/cache/build/ &&
cp -f /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/bazel-bin/xla/extension/xla_extension.tar.gz /home/awlex/.cache/xla/0.6.0/cache/build/xla_extension-x86_64-linux-gnu-rocm.tar.gz Starting local Bazel server and connecting to it... INFO: Reading 'startup' options from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --windows_enable_symlinks INFO: Options provided by the client: Inherited 'common' options: --isatty=0 --terminal_columns=80 INFO: Reading rc options for 'build' from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'build' from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: 'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility INFO: Found applicable config definition build:short_logs in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:rocm in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1 --config=no_tfrt INFO: Found applicable config definition build:no_tfrt in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/ir,tensorflow/compiler/mlir/tfrt/ir/mlrt,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ifrt,tensorflow/compiler/mlir/tfrt/tests/mlrt,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/compiler/mlir/tfrt/transforms/mlrt,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/runtime_fallback/test,tensorflow/core/runtime_fallback/test/gpu,tensorflow/core/runtime_fallback/test/saved_model,tensorflow/core/runtime_fallback/test/testdata,tensorflow/core/tfrt/stubs,tensorflow/core/tfrt/tfrt_session,tensorflow/core/tfrt/mlrt,tensorflow/core/tfrt/mlrt/attribute,tensorflow/core/tfrt/mlrt/kernel,tensorflow/core/tfrt/mlrt/bytecode,tensorflow/core/tfrt/mlrt/interpreter,tensorflow/compiler/mlir/tfrt/translate/mlrt,tensorflow/compiler/mlir/tfrt/translate/mlrt/testdata,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils,tensorflow/core/tfrt/utils/debug,tensorflow/core/tfrt/saved_model/python,tensorflow/core/tfrt/graph_executor/python,tensorflow/core/tfrt/saved_model/utils INFO: Found applicable config definition build:linux in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes INFO: Found applicable config definition build:dynamic_kernels in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS Loading: Loading: 0 packages loaded DEBUG: /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/third_party/repo.bzl:132:14: Warning: skipping import of repository 'llvm-raw' because it already exists. Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded Loading: 0 packages loaded currently loading: xla/extension Analyzing: target //xla/extension:xla_extension (1 packages loaded, 0 targets configured) Analyzing: target //xla/extension:xla_extension (36 packages loaded, 14 targets configured) Analyzing: target //xla/extension:xla_extension (36 packages loaded, 14 targets configured) Analyzing: target //xla/extension:xla_extension (179 packages loaded, 13765 targets configured) INFO: Analyzed target //xla/extension:xla_extension (182 packages loaded, 16076 targets configured). INFO: Found 1 target... [0 / 196] [Prepa] Writing script xla/extension/xla_extension_headers.genrule_script.sh [44 / 4,961] Compiling src/google/protobuf/compiler/cpp/file.cc; 3s local ... (12 actions, 11 running) [71 / 4,961] Compiling src/google/protobuf/compiler/cpp/field.cc; 1s local ... (12 actions, 11 running) [97 / 4,961] Compiling src/google/protobuf/compiler/command_line_interface.cc; 4s local ... (12 actions, 11 running) [133 / 4,961] Compiling src/google/protobuf/util/internal/protostream_objectsource.cc; 2s local ... (12 actions, 11 running) [173 / 4,961] Compiling src/google/protobuf/descriptor.cc; 8s local ... (12 actions, 11 running) [256 / 5,172] Compiling llvm/lib/TableGen/TGParser.cpp [for host]; 3s local ... (12 actions, 11 running) [339 / 5,172] Compiling llvm/lib/Support/KnownBits.cpp [for host]; 2s local ... (12 actions, 11 running) [448 / 5,400] Compiling llvm/lib/Support/VirtualFileSystem.cpp; 4s local ... (12 actions, 11 running) [542 / 5,400] Compiling llvm/lib/Support/Caching.cpp; 1s local ... (12 actions, 11 running) [897 / 6,732] Compiling mlir/tools/mlir-tblgen/OpDocGen.cpp; 4s local ... (12 actions, 11 running) [958 / 6,732] Compiling llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp [for host]; 12s local ... (12 actions, 11 running) [1,130 / 6,732] Compiling mlir/tools/mlir-tblgen/OpDefinitionsGen.cpp [for host]; 10s local ... (12 actions, 11 running) [1,362 / 6,732] Generating code from table: lib/Target/AMDGPU/AMDGPU.td @llvm-project//llvm:AMDGPUCommonTableGen__gen_register_bank_genrule; 22s local ... (12 actions, 11 running) [1,813 / 6,732] Generating code from table: lib/Target/AMDGPU/AMDGPU.td @llvm-project//llvm:AMDGPUCommonTableGen__gen_asm_matcher_genrule; 16s local ... (12 actions, 11 running) [2,160 / 6,907] Compiling xla/hlo/utils/hlo_sharding_util.cc; 5s local ... (12 actions running) [2,285 / 6,907] Compiling xla/service/hlo_rematerialization.cc; 12s local ... (12 actions running) [2,446 / 6,907] Compiling llvm/lib/IR/AutoUpgrade.cpp; 7s local ... (12 actions running) [2,624 / 6,907] Compiling xla/service/gpu/cub_sort_kernel.cu.cc; 14s local ... (12 actions, 11 running) [2,758 / 6,907] Compiling xla/service/gpu/cub_sort_kernel.cu.cc; 12s local ... (12 actions, 11 running) [2,942 / 6,907] Compiling src/cpu/x64/gemm/f32/jit_avx2_f32_copy_an_kern_autogen.cpp; 8s local ... (12 actions, 11 running) [3,150 / 6,907] Compiling src/cpu/x64/jit_uni_resampling_kernel.cpp; 13s local ... (12 actions, 11 running) [3,366 / 6,907] Compiling src/cpu/x64/jit_brgemm_conv.cpp; 45s local ... (12 actions, 11 running) [3,642 / 6,908] Compiling llvm/lib/Passes/PassBuilder.cpp; 45s local ... (12 actions, 11 running) [3,929 / 6,908] Compiling mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp; 56s local ... (12 actions, 11 running) [4,340 / 6,908] Compiling stablehlo/dialect/StablehloOps.cpp; 41s local ... (12 actions, 11 running) [4,628 / 6,908] Compiling mlir/lib/Dialect/SPIRV/IR/SPIRVOpDefinition.cpp; 33s local ... (12 actions, 11 running) [4,938 / 6,908] Compiling llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp; 13s local ... (12 actions, 11 running) [5,317 / 6,908] Compiling llvm/lib/Target/X86/X86ISelLowering.cpp; 22s local ... (12 actions, 11 running) [5,761 / 6,908] Compiling mlir/lib/Dialect/Linalg/IR/LinalgDialect.cpp; 33s local ... (12 actions, 11 running) [6,253 / 6,908] Compiling xla/mlir_hlo/mhlo/IR/hlo_ops.cc; 67s local ... (12 actions, 11 running) [6,854 / 6,909] Compiling xla/service/gpu/runtime/fused_attention.cc; 64s local ... (12 actions, 11 running) ERROR: /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/service/gpu/BUILD:257:11: Compiling xla/service/gpu/ir_emitter_unnested.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer ... (remaining 356 arguments skipped) /home/awlex/.cache/bazel/_bazel_awlex/74b6e6c2abb213e1ba59aee5534c65a2/execroot/xla/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc:23: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13 import pipes In file included from ./xla/shape_util.h:36, from ./xla/index_util.h:25, from ./xla/literal.h:41, from ./xla/hlo/ir/dfs_hlo_visitor.h:26, from ./xla/hlo/ir/hlo_computation.h:32, from ./xla/service/gpu/ir_emitter_unnested.h:31, from xla/service/gpu/ir_emitter_unnested.cc:16: external/com_google_absl/absl/log/check.h:57: warning: "CHECK" redefined 57 | #define CHECK(condition) ABSL_LOG_INTERNAL_CHECK_IMPL((condition), #condition) | In file included from external/tsl/tsl/platform/logging.h:26, from external/tsl/tsl/platform/status.h:34, from ./xla/status.h:19, from ./xla/statusor.h:18, from ./xla/hlo/ir/hlo_opcode.h:24, from ./xla/hlo/ir/dfs_hlo_visitor.h:25: external/tsl/tsl/platform/default/logging.h:308: note: this is the location of the previous definition 308 | #define CHECK(condition)
| external/com_google_absl/absl/log/check.h:65: warning: "QCHECK" redefined 65 | #define QCHECK(condition) ABSL_LOG_INTERNAL_QCHECK_IMPL((condition), #condition) | external/tsl/tsl/platform/default/logging.h:542: note: this is the location of the previous definition 542 | #define QCHECK(condition) CHECK(condition) | external/com_google_absl/absl/log/check.h:88: warning: "DCHECK" redefined 88 | #define DCHECK(condition) ABSL_LOG_INTERNAL_DCHECK_IMPL((condition), #condition) | external/tsl/tsl/platform/default/logging.h:521: note: this is the location of the previous definition 521 | #define DCHECK(condition)
| external/com_google_absl/absl/log/check.h:116: warning: "CHECK_EQ" redefined 116 | #define CHECK_EQ(val1, val2)
| external/tsl/tsl/platform/default/logging.h:499: note: this is the location of the previous definition 499 | #define CHECK_EQ(val1, val2) CHECK_OP(Check_EQ, ==, val1, val2) | external/com_google_absl/absl/log/check.h:118: warning: "CHECK_NE" redefined 118 | #define CHECK_NE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:500: note: this is the location of the previous definition 500 | #define CHECK_NE(val1, val2) CHECK_OP(Check_NE, !=, val1, val2) | external/com_google_absl/absl/log/check.h:120: warning: "CHECK_LE" redefined 120 | #define CHECK_LE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:501: note: this is the location of the previous definition 501 | #define CHECK_LE(val1, val2) CHECK_OP(Check_LE, <=, val1, val2) | external/com_google_absl/absl/log/check.h:122: warning: "CHECK_LT" redefined 122 | #define CHECK_LT(val1, val2)
| external/tsl/tsl/platform/default/logging.h:502: note: this is the location of the previous definition 502 | #define CHECK_LT(val1, val2) CHECK_OP(Check_LT, <, val1, val2) | external/com_google_absl/absl/log/check.h:124: warning: "CHECK_GE" redefined 124 | #define CHECK_GE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:503: note: this is the location of the previous definition 503 | #define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2) | external/com_google_absl/absl/log/check.h:126: warning: "CHECK_GT" redefined 126 | #define CHECK_GT(val1, val2)
| external/tsl/tsl/platform/default/logging.h:504: note: this is the location of the previous definition 504 | #define CHECK_GT(val1, val2) CHECK_OP(Check_GT, >, val1, val2) | external/com_google_absl/absl/log/check.h:128: warning: "QCHECK_EQ" redefined 128 | #define QCHECK_EQ(val1, val2)
| external/tsl/tsl/platform/default/logging.h:543: note: this is the location of the previous definition 543 | #define QCHECK_EQ(x, y) CHECK_EQ(x, y) | external/com_google_absl/absl/log/check.h:130: warning: "QCHECK_NE" redefined 130 | #define QCHECK_NE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:544: note: this is the location of the previous definition 544 | #define QCHECK_NE(x, y) CHECK_NE(x, y) | external/com_google_absl/absl/log/check.h:132: warning: "QCHECK_LE" redefined 132 | #define QCHECK_LE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:545: note: this is the location of the previous definition 545 | #define QCHECK_LE(x, y) CHECK_LE(x, y) | external/com_google_absl/absl/log/check.h:134: warning: "QCHECK_LT" redefined 134 | #define QCHECK_LT(val1, val2)
| external/tsl/tsl/platform/default/logging.h:546: note: this is the location of the previous definition 546 | #define QCHECK_LT(x, y) CHECK_LT(x, y) | external/com_google_absl/absl/log/check.h:136: warning: "QCHECK_GE" redefined 136 | #define QCHECK_GE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:547: note: this is the location of the previous definition 547 | #define QCHECK_GE(x, y) CHECK_GE(x, y) | external/com_google_absl/absl/log/check.h:138: warning: "QCHECK_GT" redefined 138 | #define QCHECK_GT(val1, val2)
| external/tsl/tsl/platform/default/logging.h:548: note: this is the location of the previous definition 548 | #define QCHECK_GT(x, y) CHECK_GT(x, y) | external/com_google_absl/absl/log/check.h:140: warning: "DCHECK_EQ" redefined 140 | #define DCHECK_EQ(val1, val2)
| external/tsl/tsl/platform/default/logging.h:531: note: this is the location of the previous definition 531 | #define DCHECK_EQ(x, y) _TF_DCHECK_NOP(x, y) | external/com_google_absl/absl/log/check.h:142: warning: "DCHECK_NE" redefined 142 | #define DCHECK_NE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:532: note: this is the location of the previous definition 532 | #define DCHECK_NE(x, y) _TF_DCHECK_NOP(x, y) | external/com_google_absl/absl/log/check.h:144: warning: "DCHECK_LE" redefined 144 | #define DCHECK_LE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:533: note: this is the location of the previous definition 533 | #define DCHECK_LE(x, y) _TF_DCHECK_NOP(x, y) | external/com_google_absl/absl/log/check.h:146: warning: "DCHECK_LT" redefined 146 | #define DCHECK_LT(val1, val2)
| external/tsl/tsl/platform/default/logging.h:534: note: this is the location of the previous definition 534 | #define DCHECK_LT(x, y) _TF_DCHECK_NOP(x, y) | external/com_google_absl/absl/log/check.h:148: warning: "DCHECK_GE" redefined 148 | #define DCHECK_GE(val1, val2)
| external/tsl/tsl/platform/default/logging.h:535: note: this is the location of the previous definition 535 | #define DCHECK_GE(x, y) _TF_DCHECK_NOP(x, y) | external/com_google_absl/absl/log/check.h:150: warning: "DCHECK_GT" redefined 150 | #define DCHECK_GT(val1, val2)
| external/tsl/tsl/platform/default/logging.h:536: note: this is the location of the previous definition 536 | #define DCHECK_GT(x, y) _TF_DCHECK_NOP(x, y) | xla/service/gpu/ir_emitter_unnested.cc: In member function ‘tsl::Status xla::gpu::IrEmitterUnnested::EmitCubDeviceRadixSort(mlir::Operation*)’: xla/service/gpu/ir_emitter_unnested.cc:1472:33: error: ‘CubSortThunk’ was not declared in this scope 1472 | auto thunk = std::make_unique<CubSortThunk>( | ^~~~~~~~~~~~ xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: no matching function for call to ‘make_unique< >(xla::gpu::Thunk::ThunkInfo, xla::PrimitiveType, std::optionalxla::PrimitiveType, std::vectorxla::BufferAllocation::Slice&, std::vectorxla::BufferAllocation::Slice&, xla::BufferAllocation::Slice&, bool)’ 1472 | auto thunk = std::make_unique<CubSortThunk>( | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ 1473 | Thunk::ThunkInfo::WithProfileAnnotation(op), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1474 | GetShape(op->getOperand(0)).element_type(), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1475 | radix_sort_op.getInputs().size() == 2 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1476 | ? std::optional(GetShape(op->getOperand(1)).element_type()) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1477 | : std::nullopt, | ~~~~~~~~~~~~~~~ 1478 | operands, results, scratch, radix_sort_op.getDescending()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/memory:78, from ./xla/service/gpu/ir_emitter_unnested.h:21: /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1069:5: note: candidate: ‘template<class _Tp, class ... _Args> std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& ...)’ 1069 | make_unique(_Args&&... __args) | ^~~~~~~~~~~ /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1069:5: note: template argument deduction/substitution failed: xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: template argument 1 is invalid 1472 | auto thunk = std::make_unique<CubSortThunk>( | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ 1473 | Thunk::ThunkInfo::WithProfileAnnotation(op), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1474 | GetShape(op->getOperand(0)).element_type(), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1475 | radix_sort_op.getInputs().size() == 2 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1476 | ? std::optional(GetShape(op->getOperand(1)).element_type()) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1477 | : std::nullopt, | ~~~~~~~~~~~~~~~ 1478 | operands, results, scratch, radix_sort_op.getDescending()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1084:5: note: candidate: ‘template<class _Tp> std::__detail::__unique_ptr_array_t<_Tp> std::make_unique(size_t)’ 1084 | make_unique(size_t __num) | ^~~~~~~~~~~ /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1084:5: note: candidate expects 1 argument, 7 provided /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1094:5: note: candidate: ‘template<class _Tp, class ... _Args> std::__detail::_invalid_make_unique_t<Tp> std::make_unique(Args&& ...)’ (deleted) 1094 | make_unique(Args&&...) = delete; | ^~~~~~~~~~~ /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1094:5: note: template argument deduction/substitution failed: xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: template argument 1 is invalid 1472 | auto thunk = std::make_unique<CubSortThunk>( | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ 1473 | Thunk::ThunkInfo::WithProfileAnnotation(op), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1474 | GetShape(op->getOperand(0)).element_type(), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1475 | radix_sort_op.getInputs().size() == 2 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1476 | ? std::optional(GetShape(op->getOperand(1)).element_type()) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1477 | : std::nullopt, | ~~~~~~~~~~~~~~~ 1478 | operands, results, scratch, radix_sort_op.getDescending()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/unistd.h:226, from external/com_google_absl/absl/base/internal/thread_identity.h:27, from external/com_google_absl/absl/synchronization/mutex.h:70, from external/com_google_absl/absl/strings/internal/cordz_info.h:31, from external/com_google_absl/absl/strings/cord.h:91, from external/com_google_absl/absl/container/internal/hash_function_defaults.h:56, from external/com_google_absl/absl/container/flat_hash_map.h:41, from ./xla/hlo/ir/hlo_computation.h:26: external/tsl/tsl/concurrency/async_value.h: In instantiation of ‘static void tsl::internal::ConcreteAsyncValue<T>::VerifyOffsets() [with T = tsl::DummyValueForErrorAsyncValue]’: external/tsl/tsl/concurrency/async_value.h:536:18: required from ‘tsl::internal::ConcreteAsyncValue<T>::ConcreteAsyncValue(absl::lts_20230802::Status) [with T = tsl::DummyValueForErrorAsyncValue]’ external/tsl/tsl/concurrency/async_value.h:727:30: required from here external/tsl/tsl/concurrency/async_value.h:702:28: warning: ‘offsetof’ within non-standard-layout type ‘tsl::internal::ConcreteAsyncValuetsl::DummyValueForErrorAsyncValue’ is conditionally-supported [-Winvalid-offsetof] 702 | static_assert(offsetof(ConcreteAsyncValue<T>, data_store.data) == | ^ external/tsl/tsl/concurrency/async_value.h:706:28: warning: ‘offsetof’ within non-standard-layout type ‘tsl::internal::ConcreteAsyncValuetsl::DummyValueForErrorAsyncValue’ is conditionally-supported [-Winvalid-offsetof] 706 | static_assert(offsetof(ConcreteAsyncValue<T>, data_store.error) == | ^ Target //xla/extension:xla_extension failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 2466.368s, Critical Path: 158.87s INFO: 6899 processes: 469 internal, 6430 local. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully make: *** [Makefile:26: /home/awlex/.cache/xla/0.6.0/cache/build/xla_extension-x86_64-linux-gnu-rocm.tar.gz] Error 1 could not compile dependency :xla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile xla --force", update it with "mix deps.update xla" or clean it with "mix deps.clean xla" ==> fusemega ** (Mix) Could not compile with "make" (exit status: 2). You need to have gcc and make installed. If you are using Ubuntu or any other Debian-based system, install the packages "build-essential". Also install "erlang-dev" package if not included in your Erlang/OTP version. If you're on Fedora, run "dnf group install 'Development Tools'".

Awlexus avatar Dec 27 '23 22:12 Awlexus

Hey @Awlexus, this could be an issue with the build environment. To be sure, you can alternatively use the Docker scripts (./build.sh rocm), then use XLA_ARCHIVE_URL=file:///path/to/build.tzr.gz accordingly.

In case your GPU uses gfx1100 (7900 XTX), you may need to use a more recent XLA revision as per https://github.com/elixir-nx/xla/issues/63#issuecomment-1844195261 (either by setting OPENXLA_GIT_REV with mix compile or changing the Makefile directly in case of the Docker build).

jonatanklosko avatar Dec 28 '23 06:12 jonatanklosko

Thanks @jonatanklosko, I was able to compile it by using a a more recently xla git ref, but I could not get it to start GPU. I tried again by using the docker script to build it (which took a long time) and experienced the same error. It was able to allocate the memory, but the program would soon after be stopped by the operating system. Not sure where exactly this error comes from.

Error log


2023-12-28 23:43:05.394087: E xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
[info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[info] XLA service 0x7fa4c018dc30 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
[info]   StreamExecutor device (0): AMD Radeon RX 6900 XT, AMDGPU ISA version: gfx1030
[info] Using BFC allocator.
[info] XLA backend allocating 15446782771 bytes on device 0 for BFCAllocator.
fish: Job 1, 'iex -S mix phx.server $argv' terminated by signal SIGSEGV (Address boundary error)

Awlexus avatar Dec 28 '23 22:12 Awlexus

Hmm, do you do any Nx stuff on boot? Does the error happen every time? I assume it doesn't happen if you use CPU only? You can also try ELIXIR_ERL_OPTIONS="+sssdio 128 +sssdcpu 128", though it rather helps with segfaults.

jonatanklosko avatar Dec 29 '23 05:12 jonatanklosko

Sorry for the late reply, I was away for a bit.

I'm not sure what changed since then, but now I'm getting a different error message. I already tried to write out a reply, before I noticed the change, so I added it at the end in case it could be helpful.

I now ran into the error message (RuntimeError) bitcode module not found at ./opencl.bc, which I was able to resolve by setting ROCM_PATH=/opt/rocm (Mentioning this in case someone else runs into this)

Now I'm running into the following error that soon afterwards causes the OS to send a SIGABRT

2023-12-31 18:56:44.607676: E xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
[info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[info] XLA service 0x7fe7ac1707a0 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
[info]   StreamExecutor device (0): AMD Radeon RX 6900 XT, AMDGPU ISA version: gfx1030
[info] Using BFC allocator.
[info] XLA backend allocating 15446782771 bytes on device 0 for BFCAllocator.

...

beam.smp: /usr/src/debug/hip-runtime-amd/clr-rocm-5.7.1/hipamd/src/hip_code_object.cpp:762: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.
Old Reply

do you do any Nx stuff on boot?

I've added a serving of openai/whisper to my application's supervision tree, but that should be all

    {:ok, model_info} = Bumblebee.load_model({:hf, @whisper_model})
    {:ok, featurizer} = Bumblebee.load_featurizer({:hf, @whisper_model})
    {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, @whisper_model})
    {:ok, generation_config} = Bumblebee.load_generation_config({:hf, @whisper_model})
    generation_config = Bumblebee.configure(generation_config, max_new_tokens: 100)

    serving =
      Bumblebee.Audio.speech_to_text_whisper(
        model_info,
        featurizer,
        tokenizer,
        generation_config,
        compile: [batch_size: 4],
        chunk_num_seconds: 30,
        stream: true,
        defn_options: [compiler: EXLA]
      )

Does the error happen every time? I assume it doesn't happen if you use CPU only?

Yes, it happens every time, before the serving is able to complete a single run

Awlexus avatar Dec 31 '23 18:12 Awlexus

Hmm, this looks like /opt/rocm is likely a symlink to a more specific version like /opt/rocm-5.7.1, let's set ROCM_PATH to that just to be sure. Otherwise maybe there's a certain ROCM HIP package missing in the environment?

jonatanklosko avatar Jan 02 '24 09:01 jonatanklosko

I'm running Arch Linux and rely on the packages provided there, so I'm not sure what I could be missing. I have installed every package that pops up when I search for rocm, but just to be sure I've provided a list of the installed packages below.

Hmm, this looks like /opt/rocm is likely a symlink to a more specific version

/opt/rocm really just links to the packages installed on my system.

$ ls -lah /opt
drwxr-xr-x 34 root root 4.0K Dec 31 18:53  rocm/
Installed rocm packages
  • comgr 5.7.1-1 Compiler support library for ROCm LLVM
  • hip-runtime-amd 5.7.1-1 Heterogeneous Interface for Portability ROCm
  • hipblas 5.7.1-1 ROCm BLAS marshalling library
  • hsa-rocr 5.7.1-1 HSA Runtime API and runtime for ROCm
  • magma-hip 2.7.2-2 Matrix Algebra on GPU and Multicore Architectures (with ROCm/HIP)
  • python-pytorch-rocm 2.1.2-1 Tensors and Dynamic neural networks in Python with strong GPU acceleration (with ROCm)
  • python-torchvision-rocm 0.16.2-1 Datasets, transforms, and models specific to computer vision (with ROCM support)
  • rccl 5.7.1-1 ROCm Communication Collectives Library
  • rocalution 5.7.1-1 Next generation library for iterative sparse solvers for ROCm platform
  • rocblas 5.7.1-1 Next generation BLAS implementation for ROCm platform
  • rocfft 5.7.1-1 Next generation FFT implementation for ROCm
  • rocm-clang-ocl 5.7.1-1 OpenCL compilation with clang compiler
  • rocm-cmake 5.7.1-1 CMake modules for common build tasks needed for the ROCm software stack
  • rocm-core 5.7.1-1 AMD ROCm core package (version files)
  • rocm-device-libs 5.7.1-1 ROCm Device Libraries
  • rocm-hip-libraries 5.7.1-2 Develop certain applications using HIP and libraries for AMD platforms
  • rocm-hip-runtime 5.7.1-2 Packages to run HIP applications on the AMD platform
  • rocm-hip-sdk 5.7.1-2 Develop applications using HIP and libraries for AMD platforms
  • rocm-language-runtime 5.7.1-2 ROCm runtime
  • rocm-llvm 5.7.1-1 Radeon Open Compute - LLVM toolchain (llvm, clang, lld)
  • rocm-ml-libraries 5.7.1-2 Packages for key Machine Learning libraries
  • rocm-ml-sdk 5.7.1-2 develop and run Machine Learning applications optimized for AMD platforms
  • rocm-opencl-runtime 5.7.1-1 OpenCL implementation for AMD
  • rocm-opencl-sdk 5.7.1-2 Develop OpenCL-based applications for AMD platforms
  • rocm-smi-lib 5.7.1-1 ROCm System Management Interface Library
  • rocminfo 5.7.1-1 ROCm Application for Reporting System Info
  • rocrand 5.7.1-1 Pseudo-random and quasi-random number generator on ROCm
  • rocsolver 5.7.1-1 Subset of LAPACK functionality on the ROCm platform
  • rocsparse 5.7.1-1 BLAS for sparse computation on top of ROCm
  • rocthrust 5.7.1-1 Port of the Thrust parallel algorithm library atop HIP/ROCm
  • roctracer 5.7.1-1 ROCm tracer library for performance tracing

Awlexus avatar Jan 04 '24 13:01 Awlexus

I see. It must be something environment related, given that others managed to run it with that revision, but I don't have any more guesses right now.

One alternative would be running stuff inside Docker, though that's not exactly convenient. Or you could try building with the latest openxla revision to see if it's something fixed upstream, but note that this usually requires some adjustments in the build file or/and in exla (depending on how much the xla APIs changed).

jonatanklosko avatar Jan 05 '24 04:01 jonatanklosko

We just had a new release, see https://github.com/elixir-nx/xla/issues/82#issuecomment-2124230058. You can try it with ROCm 6.0, and if there are issues, leave a comment on #82 :)

jonatanklosko avatar May 22 '24 08:05 jonatanklosko