stylable icon indicating copy to clipboard operation
stylable copied to clipboard

install failed

Open themoonstone opened this issue 1 year ago • 0 comments

Describe the bug fatal occured when I built a docker images with Dockerfile

To Reproduce Steps to reproduce the behavior:

  1. the content of my Dockerfile:
COPY ../byteps ./byteps
RUN ls -alh ./byteps
ARG https_proxy
ARG http_proxy

ARG BYTEPS_BASE_PATH=/usr/local
ARG BYTEPS_PATH=$BYTEPS_BASE_PATH/byteps
ARG BYTEPS_GIT_LINK=https://github.com/bytedance/byteps
ARG BYTEPS_BRANCH=master

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends \
        build-essential \
        tzdata \
        ca-certificates \
        git \
        curl \
        wget \
        vim \
        cmake \
        lsb-release \
        libnuma-dev \
        ibverbs-providers \
        librdmacm-dev \
        ibverbs-utils \
        rdmacm-utils \
        libibverbs-dev \
        python3 \
        python3-dev \
        python3-pip \
        python3-setuptools \
        libnccl2=2.21.5-1+cuda12.2 \
        libnccl-dev=2.21.5-1+cuda12.2
#COPY --from=builder /etc/reslov.conf /etc/reslov.conf
# install framework
# note: for tf <= 1.14, you need gcc-4.9
RUN g++ --version
ARG FRAMEWORK=tensorflow
RUN if [ "$FRAMEWORK" = "tensorflow" ]; then \
        pip3 install --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple pip; \
        pip3 install tensorflow==2.5.0 -i https://pypi.tuna.tsinghua.edu.cn/simple; \
	pip3 install --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple setuptools; \
    elif [ "$FRAMEWORK" = "pytorch" ]; then \
        pip3 install -U numpy==1.18.1 torchvision==0.5.0 torch==1.4.0; \
    elif [ "$FRAMEWORK" = "mxnet" ]; then \
        pip3 install -U mxnet-cu100==1.5.0; \
    else \
        echo "unknown framework: $FRAMEWORK"; \
        exit 1; \
    fi
RUN ls -lh /byteps
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH
RUN cd $BYTEPS_BASE_PATH &&\
#COPY --form=builder /home/albert/tanyi4/github.com/bytedance/byteps $BYTEPS_PATH
#    git clone --recursive -b $BYTEPS_BRANCH $BYTEPS_GIT_LINK &&\
    cp /byteps ./byteps -r && \
    cd $BYTEPS_PATH &&\ 
    python3 setup.py install
  1. then built : docker build -t bytepsimage/tensorflow . -f Dockerfile --build-arg FRAMEWORK=tensorflow
  2. ** the error log is as follows: ** Libraries have been installed in: #13 78.85 | ^~~~~~~~ #13 78.88 byteps/server/server.cc: In function ‘void byteps::server::BytePSHandler(const ps::KVMeta&, const ps::KVPairs&, ps::KVServer)’: #13 78.88 byteps/server/server.cc:350:15: warning: unused variable ‘update’ [-Wunused-variable] #13 78.88 350 | auto& update = updates->merged; #13 78.88 | ^~~~~~ #13 78.94 In file included from 3rdparty/ps-lite/include/ps/ps.h:13, #13 78.94 from byteps/server/server.h:24, #13 78.94 from byteps/server/server.cc:16: #13 78.94 3rdparty/ps-lite/include/ps/kv_app.h: In instantiation of ‘ps::KVServer<Val>::KVServer(int, bool, int) [with Val = char]’: #13 78.94 byteps/server/server.cc:501:62: required from here #13 78.94 3rdparty/ps-lite/include/ps/kv_app.h:354:18: warning: ‘new’ of type ‘ps::Customer’ with extended alignment 64 [-Waligned-new=] #13 78.94 354 | this->obj_ = new Customer( #13 78.94 | ^~~~~~~~~~~~~ #13 78.94 355 | app_id, app_id, std::bind(&KVServer::Process, this, 1), postoffice); #13 78.94 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #13 78.94 3rdparty/ps-lite/include/ps/kv_app.h:354:18: note: uses ‘void operator new(std::size_t)’, which does not have an alignment parameter #13 78.94 3rdparty/ps-lite/include/ps/kv_app.h:354:18: note: use ‘-faligned-new’ to enable C++17 over-aligned new support #13 82.24 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 build/temp.linux-x86_64-cpython-38/byteps/common/common.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/compressor_registry.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/error_feedback.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/dithering.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/onebit.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/randomk.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/topk.o build/temp.linux-x86_64-cpython-38/byteps/common/compressor/impl/vanilla_error_feedback.o build/temp.linux-x86_64-cpython-38/byteps/common/cpu_reducer.o build/temp.linux-x86_64-cpython-38/byteps/common/logging.o build/temp.linux-x86_64-cpython-38/byteps/server/server.o 3rdparty/ps-lite/build/libps.a 3rdparty/ps-lite/deps/lib/libzmq.a -L/usr/local/nccl/lib -L/usr/local/nccl/lib64 -L/usr/lib -lrdmacm -libverbs -lrt -o build/lib.linux-x86_64-cpython-38/byteps/server/c_lib.cpython-38-x86_64-linux-gnu.so -Wl,--version-script=byteps.lds -fopenmp #13 82.46 INFO: Unable to build TensorFlow plugin, will skip it. #13 82.46 #13 82.46 Traceback (most recent call last): #13 82.46 File "setup.py", line 383, in check_tf_version #13 82.46 import tensorflow as tf #13 82.46 ModuleNotFoundError: No module named 'tensorflow' #13 82.46 #13 82.46 During handling of the above exception, another exception occurred:

Environment (please complete the following information):

  • OS: ubuntu20.04
  • GCC version: g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20)
  • CUDA and NCCL version: CUDA 12.2.0 , NCCL: 2.21.5
  • Framework (TF, PyTorch, MXNet): tensorflow-2.5.0
  • pip-24.0

themoonstone avatar Apr 17 '24 02:04 themoonstone