serving
serving copied to clipboard
Failed to build model server with GPU support
Bug Report
Encountered errors when building tensorflow_model_server with r2.7.
System information
-
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Using tensorflow/serving:latest-devel-gpu
-
TensorFlow Serving installed from (source or binary): The bug arises when I tried to build the binary
-
TensorFlow Serving version: r2.7
Describe the problem
When building the model server binary with cuda support inside tensorflow/serving:latest-devel-gpu
, I got the following error
ERROR: /src/tensorflow_serving/util/net_http/server/testing/BUILD:9:10: Couldn't build file
tensorflow_serving/util/net_http/server/testing/evhttp_echo_server: Linking of rule
'//tensorflow_serving/util/net_http/server/testing:evhttp_echo_server' failed (Exit 1):
crosstool_wrapper_driver_is_not_gcc failed: error executing command
Exact Steps to Reproduce
- bash into the container and mount certain host directories to the container
docker run -it -w /src \
-v $SRC_DIR:/src \
-v $CACHE_VOL:/bazel \
-v $OUTPUT_VOL:/mnt \
-e HOST_PERMS="$(id -u):$(id -g)" \
tensorflow/serving:latest-devel-gpu bash
environment variables:
SRC_DIR
: root directory of the tensoflow/serving repo
CACHE_VOL
: a host directory used to save bazel cache (in my case: ~/tf_serving_build_cache
)
OUTPUT_VOL
: a host directory used for built binary (in my case: ~/tf_serving_out
)
- build with cuda support
set -ex
bazel --output_base /bazel build --config=cuda -c opt tensorflow_serving/... \
--verbose_failures
cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /mnt
bazel --output_base /bazel test --config=cuda tensorflow_serving/...
Source code / logs
ERROR: /src/tensorflow_serving/util/net_http/server/testing/BUILD:9:10: Couldn't build file tensorflow_serving/util/net_http/server/testing/evhttp_echo_server: Linking of rule '//tensorflow_serving/util/net_http/server/testing:evhttp_echo_server' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /bazel/execroot/tf_serving && \
exec env - \
LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:/usr/include/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
PWD=/proc/self/cwd \
TF_CUDA_COMPUTE_CAPABILITIES=sm_35,sm_50,sm_60,sm_70,sm_75,compute_80 \
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/tensorflow_serving/util/net_http/server/testing/evhttp_echo_server-2.params)
Execution platform: @local_execution_config_platform//:platform
/usr/bin/ld: bazel-out/k8-opt/bin/external/com_google_absl/absl/strings/libstrings.a(charconv.o): undefined reference to symbol 'nan@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libm.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
INFO: Elapsed time: 4.954s, Critical Path: 0.10s
INFO: 5 processes: 5 internal.
FAILED: Build did NOT complete successfully
Solution
Following #971, I am able to fix the issue by adding linkopts = ["-lm"]
to cc_binary
in tensorflow_serving/util/net_http/socket/testing/BUILD
I want to know if this is the correct fix?
@hsl89
As per as my understanding that should be fine and for better understanding you can refer similar issue
@hsl89
Could you please move this to closed status as it is resolved. Thanks
@pindinagesh I wouldn't call it resolved if there's no PR for it.
@hsl89 ,
I can see a merged PR which resolves similar issue. Kindly let me know if this issue can be closed. Thank you!
Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!