addons
addons copied to clipboard
Unable to build from source with GPU support
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
- TensorFlow version and how it was installed (source or binary): source
- TensorFlow-Addons version and how it was installed (source or binary): source
- Python version: 3.8.10
- Is GPU used? (yes/no): yes
Describe the bug
I've downloaded and built from source TF 2.9.1 with GPU support. No errors. But an error occurred during tensorflow_addons (0.17.0) building from source
DEBUG: /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/org_tensorflow/third_party/repo.bzl:124:14:
Warning: skipping import of repository 'cub_archive' because it already exists.
DEBUG: /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:10:
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1596824487 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
/home/alex/jupyter/build/addons/WORKSPACE:45:14: in <toplevel>
/home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/org_tensorflow/tensorflow/workspace0.bzl:107:34: in workspace
/home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/bazel_toolchains/repositories/repositories.bzl:35:23: in repositories
Repository rule git_repository defined at:
/home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
WARNING: /home/alex/.cache/bazel/_bazel_alex/f67f41e413892adc9e99d88ee1f21ae3/external/local_config_tf/BUILD:13345:8: target 'libtensorflow_framework.so.2' is both a rule and a file; please choose another name for the rule
INFO: Analyzed target //:build_pip_pkg (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/alex/jupyter/build/addons/tensorflow_addons/custom_ops/image/BUILD:7:18: Compiling tensorflow_addons/custom_ops/image/cc/kernels/adjust_hsv_in_yiq_op_gpu.cu.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/ubuntu20.04-gcc9_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF ... (remaining 60 arguments skipped)
/dt9/usr/bin/gcc: No such file or directory
nvcc fatal : Failed to preprocess host compiler properties.
Target //:build_pip_pkg failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.429s, Critical Path: 0.12s
INFO: 25 processes: 25 internal.
FAILED: Build did NOT complete successfully
Code to reproduce the issue
git clone https://github.com/tensorflow/addons.git
cd addons
export TF_NEED_CUDA="1"
python3 ./configure.py
bazel clean --expunge
bazel build build_pip_pkg
But if I manually replace crosstool_top value in .bazelrc with "@local_config_cuda//crosstool:toolchain" - build continues... And another error occured:
ERROR: /home/alex/jupyter/build/addons/tensorflow_addons/custom_ops/seq2seq/BUILD:7:18: Compiling tensorflow_addons/custom_ops/seq2seq/cc/kernels/beam_search_ops_gpu.cu.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF ... (remaining 61 arguments skipped)
Traceback (most recent call last):
File "external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 269, in <module>
sys.exit(main())
File "external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 256, in main
return InvokeNvcc(leftover, log=args.cuda_log)
File "external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 207, in InvokeNvcc
nvccopts += r'-gencode=arch=compute_%s,\"code=sm_%s\" ' % (
TypeError: not all arguments converted during string formatting
It can be fixed by removing last ", capability" here https://github.com/tensorflow/addons/blob/master/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl#L208
And after that build will be successful.
Can you try to submit a PR?
@seanpmorgan Do we still needs these build_deps with the new 2.9 toolchain?
This is not a proper PR, so I'll just put in a patch. I don't know if this has any side-effect. I guess it will fail on the official manylinux building process.
From 2f32601be926472f142bffbe820a28d05682219a Mon Sep 17 00:00:00 2001
From: Bernhard Bermeitinger <[email protected]>
Date: Fri, 27 May 2022 10:52:13 +0200
Subject: [PATCH] fix compilation on cuda
Signed-off-by: Bernhard Bermeitinger <[email protected]>
---
.../crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl | 2 +-
configure.py | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl b/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
index affc0be..3b5fd82 100644
--- a/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
+++ b/build_deps/toolchains/gpu/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc.tpl
@@ -205,7 +205,7 @@ def InvokeNvcc(argv, log=False):
x.replace(".", "") for x in supported_cuda_compute_capabilities])
for capability in supported_cuda_compute_capabilities[:-1]:
nvccopts += r'-gencode=arch=compute_%s,\"code=sm_%s\" ' % (
- capability, capability, capability)
+ capability, capability)
if supported_cuda_compute_capabilities:
capability = supported_cuda_compute_capabilities[-1]
nvccopts += r'-gencode=arch=compute_%s,code=\"sm_%s,compute_%s\" ' % (
diff --git a/configure.py b/configure.py
index 0d65e88..24fd2d5 100644
--- a/configure.py
+++ b/configure.py
@@ -185,7 +185,7 @@ def configure_cuda():
write("build --config=cuda")
write("build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true")
write(
- "build:cuda [email protected]_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda//crosstool:toolchain"
+ "build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain"
)
--
2.36.1
Save it as fix_cuda.patch and apply it with patch -p1 -i fix_cuda.patch.
Ugly fix on ubuntu 20.04
sudo mkdir -p /dt9/usr
sudo ln -s /usr/bin /dt9/usr/bin
Is @ubuntu20.04-gcc9_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda only intended to build from a docker image ?
Is @ubuntu20.04-gcc9_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda only intended to build from a docker image ?
It is mainly for producing manylinux2014 compatible wheels. But as we don't wan to maintain too much build configs we rely on this.
@bhack , this issue is still not resolved. It still required to manually replace crosstool_top value in .bazelrc with "@local_config_cuda//crosstool:toolchain" I think it should be either set automatically when building outside docker or specified via args in "configure" command and documented in readme.
@shkarupa-alex It was closed automatically as connect to your PR by Github "magic" keywords..