dl-docker
dl-docker copied to clipboard
Fixed some issues with GPU Dockerfile
Fixes: cuDNN6, libcudnn.so.6 issue, #59, latest version of deep learning libraries, pandas, sklearn upgrade, added some of my favorite python libraries as well
FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu14.04
LABEL authors="Sai Soundararaj <[email protected]>, Pankesh Bamotra <[email protected]>"
ARG THEANO_VERSION=rel-0.9.0
ARG TENSORFLOW_VERSION=1.3.0
ARG TENSORFLOW_ARCH=gpu
ARG KERAS_VERSION=2.0.8
ARG LASAGNE_VERSION=v0.1
ARG TORCH_VERSION=latest
ARG CAFFE_VERSION=master
ARG CUDNN_TAR_FILE=cudnn-8.0-linux-x64-v6.0.tgz
#RUN echo -e "\n**********************\nNVIDIA Driver Version\n**********************\n" && \
# cat /proc/driver/nvidia/version && \
# echo -e "\n**********************\nCUDA Version\n**********************\n" && \
# nvcc -V && \
# echo -e "\n\nBuilding your Deep Learning Docker Image...\n"
# Install some dependencies
RUN apt-get update && apt-get install -y \
bc \
build-essential \
cmake \
curl \
g++ \
gfortran \
git \
libffi-dev \
libfreetype6-dev \
libhdf5-dev \
libjpeg-dev \
liblcms2-dev \
libopenblas-dev \
liblapack-dev \
libopenjpeg2 \
libpng12-dev \
libssl-dev \
libtiff5-dev \
libwebp-dev \
libzmq3-dev \
nano \
pkg-config \
python-dev \
software-properties-common \
unzip \
vim \
wget \
zlib1g-dev \
qt5-default \
libvtk6-dev \
zlib1g-dev \
libjpeg-dev \
libwebp-dev \
libpng-dev \
libtiff5-dev \
libjasper-dev \
libopenexr-dev \
libgdal-dev \
libdc1394-22-dev \
libavcodec-dev \
libavformat-dev \
libswscale-dev \
libtheora-dev \
libvorbis-dev \
libxvidcore-dev \
libx264-dev \
yasm \
libopencore-amrnb-dev \
libopencore-amrwb-dev \
libv4l-dev \
libxine2-dev \
libtbb-dev \
libeigen3-dev \
python-dev \
python-tk \
python-numpy \
python3-dev \
python3-tk \
python3-numpy \
ant \
default-jdk \
doxygen \
&& \
apt-get clean && \
apt-get autoremove && \
rm -rf /var/lib/apt/lists/* && \
# Link BLAS library to use OpenBLAS using the alternatives mechanism (https://www.scipy.org/scipylib/building/linux.html#debian-ubuntu)
update-alternatives --set libblas.so.3 /usr/lib/openblas-base/libblas.so.3
# Install cuDNN v6.0
RUN wget http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/${CUDNN_TAR_FILE} -P /root/downloads && \
cd /root/downloads && \
tar -xzvf ${CUDNN_TAR_FILE}
ADD cuda/include/cudnn.h /usr/local/cuda-8.0/include
ADD cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64/
RUN chmod a+r /usr/local/cuda-8.0/lib64/libcudnn*
ENV CUDA_HOME=/usr/local/cuda
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
RUN cd /usr/local/cuda/lib64 && \
rm libcudnn.so && \
rm libcudnn.so.6 && \
ln libcudnn.so.6.* libcudnn.so.6 && \
ln libcudnn.so.6 libcudnn.so && \
ldconfig
# Install pip
RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
python get-pip.py && \
rm get-pip.py
# Add SNI support to Python
RUN pip --no-cache-dir install \
pyopenssl \
ndg-httpsclient \
pyasn1
# Install useful Python packages using apt-get to avoid version incompatibilities with Tensorflow binary
# especially numpy, scipy, skimage and sklearn (see https://github.com/tensorflow/tensorflow/issues/2034)
RUN apt-get update && apt-get install -y \
python-numpy \
python-scipy \
python-nose \
python-h5py \
python-skimage \
python-matplotlib \
python-pandas \
python-sklearn \
python-sympy \
&& \
apt-get clean && \
apt-get autoremove && \
rm -rf /var/lib/apt/lists/*
# Install other useful Python packages using pip
RUN pip --no-cache-dir install --upgrade ipython pandas sklearn && \
pip --no-cache-dir install \
Cython \
click \
grequests \
h5py \
python-dotenv \
sqlalchemy-redshift \
gevent \
awscli \
ipykernel \
jupyter \
path.py \
Pillow \
pygments \
six \
sphinx \
wheel \
zmq \
&& \
python -m ipykernel.kernelspec
# Install TensorFlow
RUN pip --no-cache-dir install \
https://storage.googleapis.com/tensorflow/linux/${TENSORFLOW_ARCH}/tensorflow_${TENSORFLOW_ARCH}-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl
# Install dependencies for Caffe
RUN apt-get update && apt-get install -y \
libboost-all-dev \
libgflags-dev \
libgoogle-glog-dev \
libhdf5-serial-dev \
libleveldb-dev \
liblmdb-dev \
libopencv-dev \
libprotobuf-dev \
libsnappy-dev \
protobuf-compiler \
&& \
apt-get clean && \
apt-get autoremove && \
rm -rf /var/lib/apt/lists/*
# Install Caffe
RUN git clone -b ${CAFFE_VERSION} --depth 1 https://github.com/BVLC/caffe.git /root/caffe && \
cd /root/caffe && \
cat python/requirements.txt | xargs -n1 pip install && \
mkdir build && cd build && \
cmake -DUSE_CUDNN=1 -DBLAS=Open .. && \
make -j"$(nproc)" all && \
make install
# Set up Caffe environment variables
ENV CAFFE_ROOT=/root/caffe
ENV PYCAFFE_ROOT=$CAFFE_ROOT/python
ENV PYTHONPATH=$PYCAFFE_ROOT:$PYTHONPATH \
PATH=$CAFFE_ROOT/build/tools:$PYCAFFE_ROOT:$PATH
RUN echo "$CAFFE_ROOT/build/lib" >> /etc/ld.so.conf.d/caffe.conf && ldconfig
# Install Theano and set up Theano config (.theanorc) for CUDA and OpenBLAS
RUN pip --no-cache-dir install git+git://github.com/Theano/Theano.git@${THEANO_VERSION} && \
\
echo "[global]\ndevice=gpu\nfloatX=float32\noptimizer_including=cudnn\nmode=FAST_RUN \
\n[lib]\ncnmem=0.95 \
\n[nvcc]\nfastmath=True \
\n[blas]\nldflag = -L/usr/lib/openblas-base -lopenblas \
\n[DebugMode]\ncheck_finite=1" \
> /root/.theanorc
# Install Keras
RUN pip --no-cache-dir install git+git://github.com/fchollet/keras.git@${KERAS_VERSION}
# Install Lasagne
RUN pip --no-cache-dir install git+git://github.com/Lasagne/Lasagne.git@${LASAGNE_VERSION}
# Install Torch
RUN git clone https://github.com/torch/distro.git /root/torch --recursive && \
cd /root/torch && \
bash install-deps && \
yes no | ./install.sh
# Export the LUA evironment variables manually
ENV LUA_PATH='/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua' \
LUA_CPATH='/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so' \
PATH=/root/torch/install/bin:$PATH \
LD_LIBRARY_PATH=/root/torch/install/lib:$LD_LIBRARY_PATH \
DYLD_LIBRARY_PATH=/root/torch/install/lib:$DYLD_LIBRARY_PATH
ENV LUA_CPATH='/root/torch/install/lib/?.so;'$LUA_CPATH
# Install the latest versions of nn, cutorch, cunn, cuDNN bindings and iTorch
RUN luarocks install nn && \
luarocks install cutorch && \
luarocks install cunn && \
luarocks install loadcaffe && \
\
cd /root && git clone https://github.com/soumith/cudnn.torch.git && cd cudnn.torch && \
git checkout R4 && \
luarocks make && \
\
cd /root && git clone https://github.com/facebook/iTorch.git && \
cd iTorch && \
luarocks make
# Install OpenCV
RUN git clone --depth 1 https://github.com/opencv/opencv.git /root/opencv && \
cd /root/opencv && \
mkdir build && \
cd build && \
cmake -DCMAKE_LIBRARY_PATH=/usr/local/cuda/lib64/stubs -DWITH_QT=ON -DWITH_OPENGL=ON -DFORCE_VTK=ON -DWITH_TBB=ON -DWITH_GDAL=ON -DWITH_XINE=ON -DBUILD_EXAMPLES=ON .. && \
make -j"$(nproc)" && \
make install && \
ldconfig && \
echo 'ln /dev/null /dev/raw1394' >> ~/.bashrc
# Expose Ports for TensorBoard (6006), Ipython (8888), Flask service(8080)
EXPOSE 6006 8888 8080
WORKDIR "/root"
CMD ["/bin/bash"]
Is this supposed to work in the repository I just cloned? In addition to the "build" instructions missing a parameter, I entered a dot "." for the current path, I could not get a GPU build, log attached error.txt
@reppolice No Pull Request was made for this code change. The easy solution is to clone this repo: [email protected]:Paperspace/dl-docker.git
@jtryan I cloned that one, but didn't have much success,I don't think it would be a problem on my side, would it?
... 2017-10-21 20:47:49 (1.84 MB/s) - '/root/downloads/cudnn-8.0-linux-x64-v6.0.tgz' saved [201134139/201134139]
cuda/include/cudnn.h cuda/lib64/libcudnn.so cuda/lib64/libcudnn.so.6 cuda/lib64/libcudnn.so.6.0.21 cuda/lib64/libcudnn_static.a ---> fb47c42ccca4 Removing intermediate container 871dccc622e8 Step 13/40 : ADD cuda/include/cudnn.h /usr/local/cuda-8.0/include ADD failed: stat /var/lib/docker/tmp/docker-builder494311789/cuda/include/cudnn.h: no such file or directory
Try this. I'm just a beginner with Docker so excuse my hack-y way of doing things.
@reppolice No I got the same error. @pbamotra change to the Dockerfile.gpu adding this section should work fine. If you replace Dockerfile.gpu from this repo with his file above, anIt is a long build though... :smile:
Hi, I'm running into the cuDNN6, libcudnn.so.6 issue trying to import tensorflow tf-nightly-gpu==1.5.0-dev20171127.
Would you be willing to provide me with instructions on how to resolve this? I'm not sure what exactly to do with the Dockerfile but I will try to figure it out in the meantime.
Thanks in advance. Aside from this (which is blocking training) using floydhub has been great.
EDIT
I partially resolved the issue with the following setup.sh script:
#!/bin/bash
echo 'PATH is:'
echo $PATH
echo 'LD_LIBRARY_PATH is:'
echo $LD_LIBRARY_PATH
echo "lib64:"
ls /usr/local/cuda/lib64/ | grep "libcudn*"
echo "include: "
ls /usr/local/cuda/include/ | grep "libcudn*"
CUDNN_TAR_FILE=cudnn-8.0-linux-x64-v6.0.tgz
wget http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/${CUDNN_TAR_FILE} -P /root/downloads && \
cd /root/downloads && \
tar -xzvf ${CUDNN_TAR_FILE} && \
cp cuda/include/cudnn.h /usr/local/cuda-8.0/include && \
cp cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64/ && \
chmod a+r /usr/local/cuda-8.0/lib64/libcudnn*
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
cd /usr/local/cuda/lib64 && \
rm libcudnn.so && \
rm libcudnn.so.6 && \
ln libcudnn.so.6.* libcudnn.so.6 && \
ln libcudnn.so.6 libcudnn.so && \
ldconfig
echo 'PATH is:'
echo $PATH
echo 'LD_LIBRARY_PATH is:'
echo $LD_LIBRARY_PATH
echo "lib64:"
ls /usr/local/cuda/lib64/ | grep "libcudn*"
echo "include: "
ls /usr/local/cuda/include/ | grep "libcudn*"
#this may not be necessary or useful
pip3 install cudnn-python-wrappers
pip3 install tf-nightly-gpu
However, that does mean I need to download the 65 MB Nvidia driver on every build. Obviously this is less than ideal. Would Floydhub be willing to fix this?