xla icon indicating copy to clipboard operation
xla copied to clipboard

[platform set error][GPU-cuda]NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?

Open FatJhon opened this issue 1 year ago • 8 comments

./configure.py --backend=CUDA

bazel build --test_output=all --spawn_strategy=sandboxed //xla/...

when set platform cuda,TF_ASSERT_OK_AND_ASSIGN(se::Platform * platform, PlatformUtil::GetPlatform("cuda")); get an error:NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in? how can i solve it?

FatJhon avatar Jul 17 '24 08:07 FatJhon

Can you share the contents of your xla_configure.bazelrc file (and if present also the contents of your .tf_configure.bazelrc file)? Both should live in your root XLA directory.

beckerhe avatar Jul 18 '24 14:07 beckerhe

Thanks a lot for replying! Here only exist is xla_configure.bazelrc, and the content flows : build --action_env CLANG_COMPILER_PATH=/home/weight/tools/llvm-17.x/bin/clang-17 build --repo_env CC=/home/weight/tools/llvm-17.x/bin/clang-17 build --repo_env BAZEL_COMPILER=/home/weight/tools/llvm-17.x/bin/clang-17 build --linkopt --ld-path=/home/weight/tools/llvm-17.x/bin/ld.lld build --config nvcc_clang build --action_env CLANG_CUDA_COMPILER_PATH=/home/weight/tools/llvm-17.x/bin/clang-17 build --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-12.4 build --action_env TF_CUBLAS_VERSION=12.4.5 build --action_env TF_CUDA_COMPUTE_CAPABILITIES=8.6,8.9 build --action_env TF_CUDNN_VERSION=9 build --repo_env TF_NEED_TENSORRT=0 build --action_env TF_NCCL_VERSION=2 build --action_env PYTHON_BIN_PATH=/usr/bin/python3 build --python_path /usr/bin/python3 test --test_env LD_LIBRARY_PATH test --test_size_filters small,medium build --copt -Wno-sign-compare build --copt -Wno-error=unused-command-line-argument build --copt -Wno-gnu-offsetof-extensions build --build_tag_filters -no_oss build --test_tag_filters -no_oss test --build_tag_filters -no_oss test --test_tag_filters -no_oss

more detail and similar question is in #15054 ,i had tried gpu test in xla,but not worked,return No test targets were found, yet testing was requested. Thanks.

FatJhon avatar Jul 18 '24 14:07 FatJhon

Hmm. This looks all good. Would you mind sharing the entire Bazel output?

I'm mainly interested in what tests are failing with that error message.

beckerhe avatar Jul 18 '24 15:07 beckerhe

I finally solved this problem by modify BUID for stablehlo_compile_test.cc. This test is supportted for cpu, so gup compiler is not registed yet.

FatJhon avatar Jul 23 '24 03:07 FatJhon

I finally solved this problem by modify BUID for stablehlo_compile_test.cc. This test is supportted for cpu, so gup compiler is not registed yet.

I also encountered the same problem. How to modify it specifically?

huhuiqi7 avatar Aug 01 '24 02:08 huhuiqi7

@FatJhon Would you be able to share what exactly you changed? This might help others like @huhuiqi7.

beckerhe avatar Aug 09 '24 07:08 beckerhe

@huhuiqi7 @beckerhe late to see. change as follows: img_v3_02dj_57d0e0e3-20e6-4a40-bc0f-269b785b050g img_v3_02dj_af3f2d35-747d-4c5d-8881-bbb2d5fcb49g

FatJhon avatar Aug 09 '24 07:08 FatJhon

Ah, okay, now I'm getting the problem. Yes, you need to link the gpu_plugin (or something that links the gpu_plugin), otherwise stream executor will tell you the platform is not available.

@huhuiqi7 Is that helping you as well? If not, can you share some more details like exact error messages and build logs?

beckerhe avatar Aug 09 '24 07:08 beckerhe