addons icon indicating copy to clipboard operation
addons copied to clipboard

Build failure with Tensorflow addons 0.20

Open npanpaliya opened this issue 1 year ago • 7 comments

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux x86_64
  • TensorFlow version and how it was installed (source or binary): 2.12 via conda package of TF (Built using https://github.com/open-ce/tensorflow-feedstock)
  • TensorFlow-Addons version and how it was installed (source or binary): 0.20 (Built from source)
  • Python version: Python 3.10
  • Is GPU used? (yes/no): yes

Describe the bug While building TF addons 0.20 with TF 2.12, cuda 11.8 and cudnn 8.8.1, I'm seeing following build failure -

n file included from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/system/cuda/config.h:33,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/execution_policy.h:35,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/detail/iterator_facade_category.h:22,
                 from /usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/thrust/iterator/iterator_facade.h:37,
                 from external/cub_archive/cub/device/../iterator/arg_index_input_iterator.cuh:48,
                 from external/cub_archive/cub/device/device_reduce.cuh:41,
                 from tensorflow_addons/custom_ops/layers/cc/kernels/correlation_cost_op_gpu.cu.cc:20:
/usr/local/cuda-11.8/bin/../targets/x86_64-linux/include/cub/util_namespace.cuh:46:2: error: #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.
   46 | #error CUB requires a definition of CUB_NS_QUALIFIER when CUB_NS_PREFIX/POSTFIX are defined.

My .bazelrc looks like

build --action_env TF_HEADER_DIR="/opt/conda/envs/testaddons/lib/python3.10/site-packages/tensorflow/include"
build --action_env TF_SHARED_LIBRARY_DIR="/opt/conda/envs/testaddons/lib/python3.10/site-packages/tensorflow"
build --action_env TF_SHARED_LIBRARY_NAME="libtensorflow_framework.so.2"
build --action_env TF_CXX11_ABI_FLAG="1"
build --action_env TF_CPLUSPLUS_VER="c++17"
build --spawn_strategy=standalone
build --strategy=Genrule=standalone
build  --experimental_repo_remote_exec
build -c opt
build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1"
build --copt=-mavx
build --cxxopt=-std=c++17
build --host_cxxopt=-std=c++17
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda,/opt/conda/envs/testaddons,/usr/include"
build --action_env CUDNN_INSTALL_PATH="/opt/conda/envs/testaddons"
build --action_env TF_CUDA_VERSION="11"
build --action_env TF_CUDNN_VERSION="8.8"
test --config=cuda
build --config=cuda
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:cuda [email protected]_manylinux2014-cuda11.8-cudnn8.6-tensorrt8.4_config_cuda//crosstool:toolchain
build --action_env PYTHON_BIN_PATH="/opt/conda/envs/testaddons/bin/python"
build --action_env PYTHON_LIB_PATH="/opt/conda/envs/testaddons/lib/python3.10/site-packages"
build --python_path="/opt/conda/envs/testaddons/bin/python"
build --action_env GCC_HOST_COMPILER_PATH="/opt/conda/envs/testaddons/bin/x86_64-conda-linux-gnu-cc"

Code to reproduce the issue Build command: bazel build -s --enable_runfiles build_pip_pkg

Please provide some help to get rid of this build error.

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

npanpaliya avatar Apr 10 '23 05:04 npanpaliya

@seanpmorgan - Could you please provide some pointer?

npanpaliya avatar Apr 11 '23 04:04 npanpaliya

Does anyone have any pointers to fix this issue?

npanpaliya avatar Apr 13 '23 14:04 npanpaliya

it seems similar to https://github.com/dmlc/xgboost/issues/7378 fixed with https://github.com/dmlc/xgboost/pull/7379

bhack avatar Apr 18 '23 20:04 bhack

Okay.. Thanks @bhack. I'll give this a try.

npanpaliya avatar Apr 19 '23 06:04 npanpaliya

Running into the same issue when building tf addons 0.19 with cuda 11.8. what config should be used in this case? In my case removing cub from WORKSPACE similar to #2821 works. @seanpmorgan May I know what's the reason for cub removal in that PR?

MrAta avatar Oct 17 '23 20:10 MrAta

I have this issue in another project. Tried CUDA 10.1 and 12.3. Same issue. But there is no error with CUDA 11.4

854768750 avatar Nov 08 '23 23:11 854768750

Same issue with CUDA 10.8

fuhailin avatar Nov 18 '23 14:11 fuhailin