deepdetect icon indicating copy to clipboard operation
deepdetect copied to clipboard

Future support to Ubuntu 18.04 container on IBM OpenPower + Tesla P100 + CUDA 10.0

Open gustavovaliati opened this issue 5 years ago • 10 comments

First of all, congratulations for the great work you are doing.

This is a question.

I am aware the supported plataforms are Ubuntu 14.04 and 16.04. However, I am wondering if we have some progress already in supporting 18.04.

I am working on a IBM openpower with TESLA P100, running a pre-defined docker image built on ubuntu 18.04 with CUDA 10.0. I am kind of restricted to that, and I would like to test the deepdetect server on it. I am working to solve some compilation problems (like the one reported below), that I think are related to the different plataform I am using.

Does anyone have trying to work in a similar environment? Thank you.

Checklist

Before creating a new issue, please make sure that:

If Ok, please give as many details as possible to help us solve the problem more efficiently.

Configuration

  • Version of DeepDetect:
    • [ ] Locally compiled on:
      • [ ] Ubuntu 14.04 LTS
      • [ ] Mac OSX
      • [x] Other: Ubuntu 18.04 LTS
    • [x] Docker
    • [ ] Amazon AMI
  • Commit (shown by the server when starting): Last commit from git log 85fa6c5b20b7aed455f390599d0ec71b82f513e8

Your question / the problem you're facing:

Error message (if any) / steps to reproduce the problem:

  • [ ] list of API calls:

  • [ ] Server log output:

  • [x] Compilation error:

~/workspace/deepdetect/build$ cmake .. -DUSE_SIMSEARCH=ON -DUSE_CUDNN=OFF -DCUDA_ARCH="-gencode arch=compute_60,code=sm_60" -DUSE_TF=ON -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF 
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   filesystem
--   thread
--   system
--   iostreams
--   chrono
--   date_time
--   atomic
--   regex
-- Fetching Annoy
-- CUDA detected: 10.0
-- Added CUDA NVCC flags for: sm_70
-- Fetching Tensorflow
-- OpenCV 3 (3.4.2) found (/home/myuser/anaconda3/share/OpenCV)
-- Configuring customized caffe
-- Build Tests          : OFF
-- Caffe DEBUG          : 
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
    linked by target "dede" in directory /home/myuser/workspace/deepdetect/main

-- Configuring incomplete, errors occurred!
See also "/home/myuser/workspace/deepdetect/build/CMakeFiles/CMakeOutput.log".
See also "/home/myuser/workspace/deepdetect/build/CMakeFiles/CMakeError.log".

gustavovaliati avatar Mar 29 '19 14:03 gustavovaliati

Hi @gustavovaliati thanks for the kind words.

This is known difficulty, and the instructions how to resolve it are below. However, if you have the ability to actually run a Docker image instead for P100 following DeepDetect P100 docker, that's the prefered (painless) way :)

  • Building DeepDetect on Ubuntu 18.04 LTS requires building cppnetlib "by hand" since it appears to not be part of Ubuntu packages anymore, as follows:
wget https://github.com/cpp-netlib/cpp-netlib/archive/cpp-netlib-0.11.2-final.tar.gz
tar xvzf cpp-netlib-0.11.2-final.tar.gz
cd cpp-netlib-cpp-netlib-0.11.2-final
mkdir build
cd build
cmake ..
make
sudo make install
  • Building DeepDetect on Ubuntu 18.04 LTS with CUDA 10, requires building a version of cmake that is more recent than the version from Ubuntu, this is due to CUDA 10. Do as follows:
wget https://github.com/Kitware/CMake/releases/download/v3.14.0/cmake-3.14.0.tar.gz
tar xvzf cmake-3.14.0.tar.gz
cd cmake-3.14.0
./bootstrap
make
sudo make install

Proceed with the DeepDetect build as from Build for P100 GPU from source instructions.

We'll update the online documentation accordingly.

beniz avatar Mar 29 '19 15:03 beniz

Thank you for such a complete and quick response.

Great to know I was going to do similar procedures to solve the situation. As soon as I test it, I will report back here. Cya.

gustavovaliati avatar Mar 29 '19 17:03 gustavovaliati

Hi! With some additional steps to your instructions I have been able to overcome the initial problem. The changes are:

  • Remove current cmake: sudo apt-get remove cmake;
  • Install autoconf: sudo apt-get install autoconf;
  • Follow your instructions;
  • Create a simbolic link for the new compiled & installed cmake: sudo ln -s /usr/local/bin/cmake /usr/bin/cmake ;
  • Update the CMAKE_ROOT to the new cmake: export CMAKE_ROOT=/usr/local/share/cmake-3.14 ;

Thank you! :+1:

Right now I am working on some problems when compiling the tests with cmake -DBUILD_TESTS=ON .. && make.

I have configured the initial deepdetect build to do not use CUDNN, once I don't have it: cmake .. -DUSE_SIMSEARCH=ON -DUSE_CUDNN=OFF -DCUDA_ARCH="-gencode arch=compute_60,code=sm_60" -DUSE_TF=ON -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF

Additionally it is asking for the bazel even that it is already installed.

||/ Name                                          Version                     Architecture                Description
+++-=============================================-===========================-===========================-===============================================================================================
ii  bazel                                         0.15.0-14232.d68440f11      ppc64el                     Correct, reproducible, fast builds for everyone

Current error:

compile_linux_protobuf.sh finished successfully!!!
tensorflow/contrib/makefile/downloads/nsync/builds/default.linux.c++11/nsync.a
Using CUDA from /usr/local/cuda
CUDA support enabled
sed: can't read /usr/local/cuda/include/cudnn.h: No such file or directory
Cannot find bazel. Please install bazel.
Configuration finished
./build_tensorflow.sh: line 52: bazel: command not found
CMakeFiles/tensorflow_shared_gpu.dir/build.make:107: recipe for target 'tensorflow-stamp/tensorflow_shared_gpu-configure' failed
make[5]: *** [tensorflow-stamp/tensorflow_shared_gpu-configure] Error 127
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/tensorflow_shared_gpu.dir/all' failed
make[4]: *** [CMakeFiles/tensorflow_shared_gpu.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make[3]: *** [all] Error 2
CMakeFiles/tensorflow_cc.dir/build.make:106: recipe for target 'tensorflow_cc/src/tensorflow_cc-stamp/tensorflow_cc-configure' failed
make[2]: *** [tensorflow_cc/src/tensorflow_cc-stamp/tensorflow_cc-configure] Error 2
CMakeFiles/Makefile2:146: recipe for target 'CMakeFiles/tensorflow_cc.dir/all' failed
make[1]: *** [CMakeFiles/tensorflow_cc.dir/all] Error 2
Makefile:94: recipe for target 'all' failed
make: *** [all] Error 2

As soon as I have any solution for that I am going to report it here.

gustavovaliati avatar Apr 01 '19 18:04 gustavovaliati

Hi, unless you really need TF for some specific image model, I'd recommend to build without it, at least at first. That being said I'm pretty certain you need bazel 0.8 explicitly otherwise it won't build, see https://github.com/jolibrain/deepdetect/blob/master/docker/gpu-caffe-tf/Dockerfile

You also may want to keep cudnn support on, it's the default and most useful configuration, unless there's some good reason to deactivate it.

beniz avatar Apr 01 '19 20:04 beniz

For Ubuntu 18.04, I also need to install libssl-dev package.

panovr avatar Jun 11 '19 09:06 panovr

I remember having seen it missing here and then, might depend on the primary OS install.

beniz avatar Jun 11 '19 11:06 beniz

By the way, for Ubuntu 18.04, do we still need to use the libcurlpp version from github https://github.com/jpbarrette/curlpp.git like in Ubuntu 16.04?

panovr avatar Jun 11 '19 22:06 panovr

libcurpp has been fixed in 18.04, let us know if the instructions are not clear: https://www.deepdetect.com/quickstart-server/?opts={%22os%22:%22ubuntu%22,%22source%22:%22build_source%22,%22compute%22:%22gpu%22,%22gpu%22:%22gtx%22,%22backend%22:[%22caffe%22,%22tsne%22,%22xgboost%22],%22deepdetect%22:%22server%22}

beniz avatar Jun 12 '19 17:06 beniz

I've had some difficulties to install DD on Ubuntu 18.04, I had to install libboost-all-dev, libssl-dev before installing cppnet. I've finally made a bash script to automate the installation based on the commit version. https://gist.github.com/YaYaB/7d5b117d4a9976b73201f7fb28eaae95

YaYaB avatar Jan 15 '20 17:01 YaYaB

FYI I have a rough dockerfile that setups up DD on Ubuntu 18.04 (with caffe, TF, dlib backends) here: https://github.com/jolibrain/deepdetect/issues/687#issuecomment-572749679

Just make sure to use TF v0.13.1 and bazel v0.21.0 to avoid the errors I had in #687 :)

cchadowitz avatar Jan 15 '20 21:01 cchadowitz