serving icon indicating copy to clipboard operation
serving copied to clipboard

Failed to build model server with GPU support

Open hongshanli23 opened this issue 3 years ago • 4 comments

Bug Report

Encountered errors when building tensorflow_model_server with r2.7.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Using tensorflow/serving:latest-devel-gpu

  • TensorFlow Serving installed from (source or binary): The bug arises when I tried to build the binary

  • TensorFlow Serving version: r2.7

Describe the problem

When building the model server binary with cuda support inside tensorflow/serving:latest-devel-gpu, I got the following error

ERROR: /src/tensorflow_serving/util/net_http/server/testing/BUILD:9:10: Couldn't build file 
tensorflow_serving/util/net_http/server/testing/evhttp_echo_server: Linking of rule 
'//tensorflow_serving/util/net_http/server/testing:evhttp_echo_server' failed (Exit 1): 
crosstool_wrapper_driver_is_not_gcc failed: error executing command 

Exact Steps to Reproduce

  1. bash into the container and mount certain host directories to the container
    docker run -it -w /src \
        -v $SRC_DIR:/src \
        -v $CACHE_VOL:/bazel \
        -v $OUTPUT_VOL:/mnt \
        -e HOST_PERMS="$(id -u):$(id -g)" \
        tensorflow/serving:latest-devel-gpu bash

environment variables:

SRC_DIR: root directory of the tensoflow/serving repo

CACHE_VOL: a host directory used to save bazel cache (in my case: ~/tf_serving_build_cache)

OUTPUT_VOL: a host directory used for built binary (in my case: ~/tf_serving_out)

  1. build with cuda support
set -ex
bazel --output_base /bazel build --config=cuda -c opt tensorflow_serving/... \
    --verbose_failures

cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /mnt

bazel --output_base /bazel test --config=cuda tensorflow_serving/...

Source code / logs

ERROR: /src/tensorflow_serving/util/net_http/server/testing/BUILD:9:10: Couldn't build file tensorflow_serving/util/net_http/server/testing/evhttp_echo_server: Linking of rule '//tensorflow_serving/util/net_http/server/testing:evhttp_echo_server' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /bazel/execroot/tf_serving && \
  exec env - \
    LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:/usr/include/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
    PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
    TF_CUDA_COMPUTE_CAPABILITIES=sm_35,sm_50,sm_60,sm_70,sm_75,compute_80 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/tensorflow_serving/util/net_http/server/testing/evhttp_echo_server-2.params)
Execution platform: @local_execution_config_platform//:platform
/usr/bin/ld: bazel-out/k8-opt/bin/external/com_google_absl/absl/strings/libstrings.a(charconv.o): undefined reference to symbol 'nan@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libm.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
INFO: Elapsed time: 4.954s, Critical Path: 0.10s
INFO: 5 processes: 5 internal.
FAILED: Build did NOT complete successfully

Solution

Following #971, I am able to fix the issue by adding linkopts = ["-lm"] to cc_binary in tensorflow_serving/util/net_http/socket/testing/BUILD I want to know if this is the correct fix?

hongshanli23 avatar Nov 06 '21 21:11 hongshanli23

@hsl89

As per as my understanding that should be fine and for better understanding you can refer similar issue

pindinagesh avatar Nov 08 '21 09:11 pindinagesh

@hsl89

Could you please move this to closed status as it is resolved. Thanks

pindinagesh avatar Nov 22 '21 05:11 pindinagesh

@pindinagesh I wouldn't call it resolved if there's no PR for it.

hongshanli23 avatar Nov 22 '21 06:11 hongshanli23

@hsl89 ,

I can see a merged PR which resolves similar issue. Kindly let me know if this issue can be closed. Thank you!

singhniraj08 avatar Aug 23 '22 09:08 singhniraj08

Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!

singhniraj08 avatar Sep 23 '22 14:09 singhniraj08