serving icon indicating copy to clipboard operation
serving copied to clipboard

issues with building tfx r2.8 from source with mkl support

Open hongshanli23 opened this issue 3 years ago • 13 comments

Describe the problem the feature is intended to solve

When building tfx r2.8-rc0 with mkl support, I see the following issue:

ERROR: /root/.cache/bazel/_bazel_root/c206fe4b7a49887ed31d86472abc6776/external/org_tensorflow/tensorflow/core/common_runtime/BUILD:1739:11: Couldn't build file external/org_tensorflow/tensorflow/core/common_runtime/_objs/threadpool_device/threadpool_device.o: C++ compilation of rule '@org_tensorflow//tensorflow/core/common_runtime:threadpool_device' failed (Exit 1): gcc failed: error executing command 
  (cd /root/.cache/bazel/_bazel_root/c206fe4b7a49887ed31d86472abc6776/execroot/tf_serving && \
  exec env - \
    LD_LIBRARY_PATH='/usr/local/lib:$LD_LIBRARY_PATH' \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
  /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow/core/common_runtime/_objs/threadpool_device/threadpool_device.d '-frandom-seed=bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow/core/common_runtime/_objs/threadpool_device/threadpool_device.o' -DTF_USE_SNAPPY -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' -DHAVE_SYS_UIO_H -iquoteexternal/org_tensorflow -iquotebazel-out/k8-opt/bin/external/org_tensorflow -iquoteexternal/com_google_absl -iquotebazel-out/k8-opt/bin/external/com_google_absl -iquoteexternal/nsync -iquotebazel-out/k8-opt/bin/external/nsync -iquoteexternal/eigen_archive -iquotebazel-out/k8-opt/bin/external/eigen_archive -iquoteexternal/gif -iquotebazel-out/k8-opt/bin/external/gif -iquoteexternal/libjpeg_turbo -iquotebazel-out/k8-opt/bin/external/libjpeg_turbo -iquoteexternal/com_google_protobuf -iquotebazel-out/k8-opt/bin/external/com_google_protobuf -iquoteexternal/zlib -iquotebazel-out/k8-opt/bin/external/zlib -iquoteexternal/com_googlesource_code_re2 -iquotebazel-out/k8-opt/bin/external/com_googlesource_code_re2 -iquoteexternal/farmhash_archive -iquotebazel-out/k8-opt/bin/external/farmhash_archive -iquoteexternal/fft2d -iquotebazel-out/k8-opt/bin/external/fft2d -iquoteexternal/highwayhash -iquotebazel-out/k8-opt/bin/external/highwayhash -iquoteexternal/double_conversion -iquotebazel-out/k8-opt/bin/external/double_conversion -iquoteexternal/snappy -iquotebazel-out/k8-opt/bin/external/snappy -isystem external/nsync/public -isystem bazel-out/k8-opt/bin/external/nsync/public -isystem external/org_tensorflow/third_party/eigen3/mkl_include -isystem bazel-out/k8-opt/bin/external/org_tensorflow/third_party/eigen3/mkl_include -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/gif -isystem bazel-out/k8-opt/bin/external/gif -isystem external/com_google_protobuf/src -isystem bazel-out/k8-opt/bin/external/com_google_protobuf/src -isystem external/zlib -isystem bazel-out/k8-opt/bin/external/zlib -isystem external/farmhash_archive/src -isystem bazel-out/k8-opt/bin/external/farmhash_archive/src -isystem external/double_conversion -isystem bazel-out/k8-opt/bin/external/double_conversion -mavx -msse4.2 '-std=c++14' '-D_GLIBCXX_USE_CXX11_ABI=0' -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions -DINTEL_MKL -DENABLE_MKL -DENABLE_ONEDNN_OPENMP -msse3 -DTENSORFLOW_MONOLITHIC_BUILD -pthread -fopenmp -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/org_tensorflow/tensorflow/core/common_runtime/threadpool_device.cc -o bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow/core/common_runtime/_objs/threadpool_device/threadpool_device.o)
Execution platform: @local_execution_config_platform//:platform
external/org_tensorflow/tensorflow/core/common_runtime/threadpool_device.cc:19:10: fatal error: external/llvm_openmp/include/omp.h: No such file or directory
   19 | #include "external/llvm_openmp/include/omp.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Target //tensorflow_serving/model_servers:tensorflow_model_server failed to build
INFO: Elapsed time: 0.945s, Critical Path: 0.02s
INFO: 3 processes: 3 internal.
FAILED: Build did NOT complete successfully
cp: cannot stat 'bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server': No such file or directory

Describe alternatives you've considered

If I remove the mkl build flag, the build would succeed

Additional context

Add any other context or screenshots about the feature request here.

Bug Report

If this is a bug report, please fill out the following form in full:

System information

Please just build the following docker file. It is adapted from cpu devel image In particular, see line 105 and 111 for the build options

# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_IMAGE=ubuntu:20.04
FROM $BASE_IMAGE as base_build

ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=America/Los_Angeles

ARG TF_SERVING_VERSION_GIT_BRANCH=master
ARG TF_SERVING_VERSION_GIT_COMMIT=HEAD

LABEL maintainer="Abolfazl Shahbazi <[email protected]>"
LABEL tensorflow_serving_github_branchtag=${TF_SERVING_VERSION_GIT_BRANCH}
LABEL tensorflow_serving_github_commit=${TF_SERVING_VERSION_GIT_COMMIT}

RUN apt-get update && apt-get install -y --no-install-recommends \
        automake \
        build-essential \
        ca-certificates \
        curl \
        git \
        libcurl3-dev \
        libfreetype6-dev \
        libpng-dev \
        libtool \
        libzmq3-dev \
        mlocate \
        openjdk-8-jdk\
        openjdk-8-jre-headless \
        pkg-config \
        python-dev \
        software-properties-common \
        swig \
        unzip \
        wget \
        zip \
        zlib1g-dev \
        python3-distutils \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN curl -fSsL -O https://bootstrap.pypa.io/get-pip.py && \
    python3 get-pip.py && \
    rm get-pip.py

# Install python
ARG PYTHON=python3.8
ENV PYTHON=$PYTHON
RUN add-apt-repository ppa:deadsnakes/ppa && \
    apt-get update && apt-get install -y \
    ${PYTHON} ${PYTHON}-dev python3-pip ${PYTHON}-venv && \
    rm -rf /var/lib/apt/lists/* && \
    ${PYTHON} -m pip install pip --upgrade && \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/${PYTHON} 0

# Make ${PYTHON} the default python version
RUN update-alternatives --install /usr/bin/python python /usr/bin/${PYTHON} 0

RUN $PYTHON -m pip --no-cache-dir install \
    future>=0.17.1 \
    grpcio \
    h5py \
    keras_applications>=1.0.8 \
    keras_preprocessing>=1.1.0 \
    mock \
    numpy \
    portpicker \
    requests \
    --ignore-installed six>=1.12.0

# Set up Bazel
ARG BAZEL_VERSION=4.2.1
ENV BAZEL_VERSION=${BAZEL_VERSION}
WORKDIR /
RUN mkdir /bazel && \
    cd /bazel && \
    curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
    curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE && \
    chmod +x bazel-*.sh && \
    ./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
    cd / && \
    rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh

# Download TF Serving sources (optionally at specific commit).
# WORKDIR /tensorflow-serving
# RUN curl -sSL --retry 5 https://github.com/tensorflow/serving/tarball/${TF_SERVING_VERSION_GIT_COMMIT} | tar --strip-components=1 -xzf -

RUN git clone -b r2.8 https://github.com/tensorflow/serving.git /tensorflow_serving 

WORKDIR /tensorflow_serving
# FROM base_build as binary_build
# Build, and install TensorFlow Serving
ARG TF_SERVING_BUILD_OPTIONS="--config=mkl --config=release"

RUN echo "Building with build options: ${TF_SERVING_BUILD_OPTIONS}"
ARG TF_SERVING_BAZEL_OPTIONS=""
RUN echo "Building with Bazel options: ${TF_SERVING_BAZEL_OPTIONS}"

RUN bazel build --color=yes --curses=yes \
    ${TF_SERVING_BAZEL_OPTIONS} \
    --verbose_failures \
    --output_filter=DONT_MATCH_ANYTHING \
    ${TF_SERVING_BUILD_OPTIONS} \
    tensorflow_serving/model_servers:tensorflow_model_server && \
    cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \
    /usr/local/bin/

# Build and install TensorFlow Serving API
RUN bazel build --color=yes --curses=yes \
    ${TF_SERVING_BAZEL_OPTIONS} \
    --verbose_failures \
    --output_filter=DONT_MATCH_ANYTHING \
    ${TF_SERVING_BUILD_OPTIONS} \
    tensorflow_serving/tools/pip_package:build_pip_package && \
    bazel-bin/tensorflow_serving/tools/pip_package/build_pip_package \
    /tmp/pip && \
    pip --no-cache-dir install --upgrade \
    /tmp/pip/tensorflow_serving_api-*.whl && \
    rm -rf /tmp/pip

# Copy openmp libraries
RUN cp /root/.cache/bazel/_bazel_root/*/execroot/tf_serving/bazel-out/k8-opt/bin/external/llvm_openmp/libiomp5.so /usr/local/lib/

ENV LIBRARY_PATH '/usr/local/lib:$LIBRARY_PATH'
ENV LD_LIBRARY_PATH '/usr/local/lib:$LD_LIBRARY_PATH'

# FROM binary_build as clean_build
# # Clean up Bazel cache when done.
RUN bazel clean --expunge --color=yes && \
    rm -rf /root/.cache
CMD ["/bin/bash"]

hongshanli23 avatar Jan 03 '22 00:01 hongshanli23

@hsl89,

Can you clarify if you mean that you're trying to install tf 2.8.0-rc0 and not tfx? Because the latest stable release of TFX is 1.5.0 and not 2.8.0. If you're looking out to build Tensorflow with MKL and not tfx, you can also use this guide as reference. Thanks

sanatmpa1 avatar Jan 03 '22 17:01 sanatmpa1

@hsl89,

Closing this issue due to lack of recent activity. Please feel free to reopen the issue with more details if you still have questions. Thanks!

sanatmpa1 avatar Jan 27 '22 12:01 sanatmpa1

@sanatmpa1

sorry for the late reply. No I was trying to build tensorflow/serving (this repo) r2.8 branch from source, and I encountered the error posted in the description when building with mkl flag

hongshanli23 avatar Feb 03 '22 23:02 hongshanli23

This is happening for me too. Can we re-open the issue? @sanatmpa1

haitong avatar May 02 '22 16:05 haitong

@pindinagesh any updates?

haitong avatar May 04 '22 16:05 haitong

cc: @TensorFlow-MKL @agramesh1

penpornk avatar May 04 '22 18:05 penpornk

@hsl89 FYI, we have incorporated oneDNN (MKL) support into official TensorFlow x86 builds since TF 2.5. You can build TF with normal config (without --config=mkl) and turn on oneDNN optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=1. (And disable it by setting it to 0.)

penpornk avatar May 04 '22 18:05 penpornk

More info about the TF_ENABLE_ONEDNN_OPTS flag here.

penpornk avatar May 04 '22 18:05 penpornk

cc: @TensorFlow-MKL @agramesh1

CCing @ashahba

agramesh1 avatar May 04 '22 23:05 agramesh1

@penpornk Thanks for the pointers! FYI I tried to remove --config=mkl and set TF_ENABLE_ONEDNN_OPTS =1, now I can build tfserving with tensorflow.

mkl build still failed (not related to your change) due to (See https://github.com/tensorflow/serving/blob/c0998e13451b9b83c9bdf157dd3648b2272dac59/tensorflow_serving/tools/docker/Dockerfile.devel-mkl#L123-L124)

# Copy openmp libraries
RUN cp /root/.cache/bazel/_bazel_root/*/execroot/tf_serving/bazel-out/k8-opt/bin/external/llvm_openmp/libiomp5.so /usr/local/lib/

and there is no such a file /root/.cache/bazel/_bazel_root/*/execroot/tf_serving/bazel-out/k8-opt/bin/external/llvm_openmp/libiomp5.so.

I think tfserving team should be able to look into it where is this .so file after your ONEDNN change.

haitong avatar May 04 '22 23:05 haitong

Hi @penpornk , thanks for the update. sorry I was not able to follow-up on this thread more timely. I wonder what's the difference between --config=mkl and --config=mkl_open_source_only. In the .bazelrc we have

build:mkl --define=build_with_mkl=true --define=enable_mkl=true --define=build_with_openmp=true
build:mkl --define=tensorflow_mkldnn_contraction_kernel=0

# This config option is used to enable MKL-DNN open source library only,
# without depending on MKL binary version.
build:mkl_open_source_only --define=build_with_mkl_dnn_only=true
build:mkl_open_source_only --define=build_with_mkl=true --define=enable_mkl=true
build:mkl_open_source_only --define=tensorflow_mkldnn_contraction_kernel=0

Is it true that if we set --config=mkl, then TF will try to build against the closed source version of mkl?

hongshanli23 avatar May 05 '22 00:05 hongshanli23

Hi @hsl89 please look at that PR and should fix your issue.

ashahba avatar May 05 '22 01:05 ashahba

@hsl89,

Can you please try to build using docker file and let us know if this works. Thank you!

singhniraj08 avatar Sep 14 '22 09:09 singhniraj08

Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!

singhniraj08 avatar Nov 18 '22 05:11 singhniraj08

@penpornk @agramesh1 I am also experiencing the same issue while trying to remove all MKL ML related configuration. The included file "omp.h" does not exist in the specified folder of llvm_openmp.

I will keep you updated if there is any progress.

gzmkl avatar May 01 '23 21:05 gzmkl