dirt icon indicating copy to clipboard operation
dirt copied to clipboard

TensorFlow 1.14 in docker causes segfault

Open pmh47 opened this issue 5 years ago • 13 comments

Using TensorFlow 1.14 in the Dockerfile results in a segfault in square_test.py. Backtrace:

#0  0x00007fff9f2877b7 in tensorflow::Status tensorflow::shape_inference::InferenceContext::GetAttr<int>(absl::string_view, int*) const ()
   from /usr/local/lib/python2.7/dist-packages/dirt/librasterise.so
#1  0x00007fff9f280ae6 in {lambda(tensorflow::shape_inference::InferenceContext*)#1}::operator()(tensorflow::shape_inference::InferenceContext*) const ()
   from /usr/local/lib/python2.7/dist-packages/dirt/librasterise.so
#2  0x00007fff9f280d7e in {lambda(tensorflow::shape_inference::InferenceContext*)#1}::_FUN(tensorflow::shape_inference::InferenceContext*) ()
   from /usr/local/lib/python2.7/dist-packages/dirt/librasterise.so
#3  0x00007fff9f287ea9 in std::_Function_handler<tensorflow::Status (tensorflow::shape_inference::InferenceContext*), tensorflow::Status (*)(tensorflow::shape_inference::InferenceContext*)>::_M_invoke(std::_Any_data const&, tensorflow::shape_inference::InferenceContext*&&) ()
   from /usr/local/lib/python2.7/dist-packages/dirt/librasterise.so
#4  0x00007fffa6ef6a9d in tensorflow::shape_inference::InferenceContext::Run(std::function<tensorflow::Status (tensorflow::shape_inference::InferenceContext*)> const&)
    () from /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so.1
#5  0x00007fffb016b350 in tensorflow::ShapeRefiner::RunShapeFn(tensorflow::Node const*, tensorflow::OpRegistrationData const*, tensorflow::ExtendedInferenceContext*) ()
   from /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#6  0x00007fffb016ce78 in tensorflow::ShapeRefiner::AddNode(tensorflow::Node const*) ()
   from /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#7  0x00007fffacfba75a in TF_FinishOperation () from /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#8  0x00007fffaa82dbd6 in _wrap_TF_FinishOperation () from /usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#9  0x00000000004bc4aa in PyEval_EvalFrameEx ()
#10 0x00000000004b9b66 in PyEval_EvalCodeEx ()
#11 0x00000000004c1f56 in PyEval_EvalFrameEx ()
#12 0x00000000004b9b66 in PyEval_EvalCodeEx ()
#13 0x00000000004d57a3 in ?? ()
#14 0x00000000004eef5e in ?? ()
#15 0x00000000004eeb66 in ?? ()
#16 0x00000000004aaafb in ?? ()
#17 0x00000000004c166d in PyEval_EvalFrameEx ()
#18 0x00000000004b9b66 in PyEval_EvalCodeEx ()
#19 0x00000000004d57a3 in ?? ()
#20 0x00000000004a587e in PyObject_Call ()
#21 0x00000000004be51e in PyEval_EvalFrameEx ()
#22 0x00000000004b9b66 in PyEval_EvalCodeEx ()
#23 0x00000000004c1f56 in PyEval_EvalFrameEx ()
#24 0x00000000004b9b66 in PyEval_EvalCodeEx ()
#25 0x00000000004c17c6 in PyEval_EvalFrameEx ()
#26 0x00000000004b9b66 in PyEval_EvalCodeEx ()
#27 0x00000000004c17c6 in PyEval_EvalFrameEx ()
#28 0x00000000004b9b66 in PyEval_EvalCodeEx ()
#29 0x00000000004c17c6 in PyEval_EvalFrameEx ()
#30 0x00000000004c141f in PyEval_EvalFrameEx ()
#31 0x00000000004c141f in PyEval_EvalFrameEx ()
#32 0x00000000004b9b66 in PyEval_EvalCodeEx ()
#33 0x00000000004eb69f in ?? ()
#34 0x00000000004e58f2 in PyRun_FileExFlags ()
#35 0x00000000004e41a6 in PyRun_SimpleFileExFlags ()
#36 0x00000000004938ce in Py_Main ()
#37 0x00007ffff7810830 in __libc_start_main (main=0x493370 <main>, argc=2, argv=0x7fffffffe458, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 

pmh47 avatar Jul 02 '19 10:07 pmh47

Original report: https://github.com/pmh47/dirt/issues/2#issuecomment-507221853

pmh47 avatar Jul 02 '19 10:07 pmh47

This is due to the python2.7 & python3.5 pip packages for tensorflow-gpu==1.14 being built with gcc 4.8, and dirt being built with a newer compiler. Apparently there is a binary incompatibility that is not remedied by _GLIBCXX_USE_CXX11_ABI. Conversely, their python3.7 package is built with gcc 5.4, which works fine. Unclear why tensorflow versions <= 1.13 all work fine too (presumably luck), given they also use gcc 4.8. Related: https://github.com/tensorflow/tensorflow/issues/29951

Proper, general fix is to build the op within a docker container derived from tensorflow/tensorflow:custom-op-gpu; this guarantees compatibility with tensorflow's official pip packages.

pmh47 avatar Jul 02 '19 14:07 pmh47

isn't it simpler to switch to python 3.7?

francoisruty avatar Jul 02 '19 17:07 francoisruty

Yes, for docker; however the issue does affect non-docker installs too in certain cases, contrary to what I first thought.

For a Dockerfile that uses python 3.7 and Ubuntu 16.04, see this gist, which works with tensorflow 1.13.1 and tensorflow 1.14. However, it's not so nice as (i) forcing python 3.7 limits it to tensorflow 1.13 and newer, and (ii) it uses a root-installed non-system python (from ppa:deadsnakes).

pmh47 avatar Jul 02 '19 19:07 pmh47

So I still have the Dockerfile where I had made the edit you had told me regarding OpenGL. I also changed python to python3.7, and now I use ubuntu 18.04, I build with this command:

sudo docker build -t dirt --build-arg CUDA_BASE_VERSION=10.0 --build-arg CUDNN_VERSION=7.6.0.64 --build-arg UBUNTU_VERSION=18.04 --build-arg TENSORFLOW_VERSION=1.14.0 .

I get:

Step 20/21 : RUN cd ~ && git clone https://github.com/pmh47/dirt.git &&  	pip3 install dirt/
 ---> Running in 1bc5bdd40c46
Cloning into 'dirt'...
Processing ./dirt
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from dirt==0.3.0)
Requirement already satisfied: tensorflow-gpu>=1.6 in /usr/local/lib/python3.6/dist-packages (from dirt==0.3.0)
Requirement already satisfied: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: tensorboard<1.15.0,>=1.14.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: wheel>=0.26 in /usr/lib/python3/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: keras-applications>=1.0.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: google-pasta>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow-gpu>=1.6->dirt==0.3.0)
Collecting setuptools>=41.0.0 (from tensorboard<1.15.0,>=1.14.0->tensorflow-gpu>=1.6->dirt==0.3.0)
  Downloading https://files.pythonhosted.org/packages/ec/51/f45cea425fd5cb0b0380f5b0f048ebc1da5b417e48d304838c02d6288a1e/setuptools-41.0.1-py2.py3-none-any.whl (575kB)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow-gpu>=1.6->dirt==0.3.0)
Requirement already satisfied: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.6->tensorflow-gpu>=1.6->dirt==0.3.0)
Installing collected packages: dirt, setuptools
  Running setup.py install for dirt: started
    Running setup.py install for dirt: finished with status 'error'
    Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-rbcxbn31-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-0lpnv8lc-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    -- The CXX compiler identification is GNU 7.4.0
    -- The CUDA compiler identification is NVIDIA 10.0.130
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    CMake Warning (dev) at /usr/local/share/cmake-3.14/Modules/FindOpenGL.cmake:275 (message):
      Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when
      available.  Run "cmake --help-policy CMP0072" for policy details.  Use the
      cmake_policy command to set the policy and suppress this warning.
    
      FindOpenGL found both a legacy GL library:
    
        OPENGL_gl_LIBRARY: /usr/lib/x86_64-linux-gnu/libGL.so
    
      and GLVND libraries for OpenGL and GLX:
    
        OPENGL_opengl_LIBRARY: /usr/lib/x86_64-linux-gnu/libOpenGL.so
        OPENGL_glx_LIBRARY: /usr/lib/x86_64-linux-gnu/libGLX.so
    
      OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for
      compatibility with CMake 3.10 and below the legacy GL library will be used.
    Call Stack (most recent call first):
      CMakeLists.txt:5 (find_package)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Found OpenGL: /usr/lib/x86_64-linux-gnu/libOpenGL.so
    CMake Error at CMakeLists.txt:26 (message):
      cannot find either cuda_launch_config.h or gpu_launch_config.h
    
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-rbcxbn31-build/build/CMakeFiles/CMakeOutput.log".
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-rbcxbn31-build/setup.py", line 50, in <module>
        'Programming Language :: Python :: 3.7',
      File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 129, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/usr/lib/python3.6/distutils/command/install.py", line 589, in run
        self.run_command('build')
      File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/pip-rbcxbn31-build/setup.py", line 24, in run
        build_csrc()
      File "/tmp/pip-rbcxbn31-build/setup.py", line 18, in build_csrc
        subprocess.check_call(['cmake', os.path.join(base_path, 'csrc')], cwd=build_path)
      File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-rbcxbn31-build/csrc']' returned non-zero exit status 1.

francoisruty avatar Jul 03 '19 11:07 francoisruty

The dockerfile is cloning from github rather than using COPY from your local version, so beware that local changes to CMakeLists / etc. won't be picked up. That said, the repo version works fine for me using a similar configuration, with the dockerfile I posted in a gist before.

Note that your log references python3.6 -- did the python3.7 install work correctly? Are you perhaps using the system pip3 that points at the system python3.6?

I think maybe the error cannot find either cuda_launch_config.h or gpu_launch_config.h is related to that. You could check if the second does in fact exist under /usr/local/lib/python3.7/dist-packages/tensorflow/include/tensorflow/core/util/

pmh47 avatar Jul 03 '19 11:07 pmh47

you're right, I replaced pip commands with pip3, do you know the right syntax to force pip3 to use python 3.7? is it pip-3.7 install XXX ? I googled it but it's not very clear, multiple syntaxes are proposed

francoisruty avatar Jul 03 '19 11:07 francoisruty

It's python3.7 -m pip install ... assuming python3.7 is what your python3.7 binary is called.

pmh47 avatar Jul 03 '19 11:07 pmh47

I have other issues but indeed it works with your gist so it's OK on my side

francoisruty avatar Jul 03 '19 15:07 francoisruty

I have similar error when building docker image

cannot find either cuda_launch_config.h or gpu_launch_config.h

any suggestions?

args: CUDA_BASE_VERSION: 10.0 UBUNTU_VERSION: 18.04 CUDNN_VERSION: 7.6.1.34 TENSORFLOW_VERSION: 2.0.0b0

all logs

ERROR: Complete output from command /usr/bin/python3.7 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-req-build-nsa0ngpn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-549ihdtj --python-tag cp37:
  ERROR: running bdist_wheel
  running build
  -- The CXX compiler identification is GNU 7.4.0
  -- The CUDA compiler identification is NVIDIA 10.0.130
  -- Check for working CXX compiler: /usr/bin/c++
  -- Check for working CXX compiler: /usr/bin/c++ -- works
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  CMake Warning (dev) at /usr/local/share/cmake-3.14/Modules/FindOpenGL.cmake:275 (message):
    Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when
    available.  Run "cmake --help-policy CMP0072" for policy details.  Use the
    cmake_policy command to set the policy and suppress this warning.
  
    FindOpenGL found both a legacy GL library:
  
      OPENGL_gl_LIBRARY: /usr/lib/x86_64-linux-gnu/libGL.so
  
    and GLVND libraries for OpenGL and GLX:
  
      OPENGL_opengl_LIBRARY: /usr/lib/x86_64-linux-gnu/libOpenGL.so
      OPENGL_glx_LIBRARY: /usr/lib/x86_64-linux-gnu/libGLX.so
  
    OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for
    compatibility with CMake 3.10 and below the legacy GL library will be used.
  Call Stack (most recent call first):
    CMakeLists.txt:5 (find_package)
  This warning is for project developers.  Use -Wno-dev to suppress it.
  
  -- Found OpenGL: /usr/lib/x86_64-linux-gnu/libOpenGL.so
  CMake Error at CMakeLists.txt:26 (message):
    cannot find either cuda_launch_config.h or gpu_launch_config.h
  
  
  -- Configuring incomplete, errors occurred!
  See also "/tmp/pip-req-build-nsa0ngpn/build/CMakeFiles/CMakeOutput.log".
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-req-build-nsa0ngpn/setup.py", line 50, in <module>
      'Programming Language :: Python :: 3.7',
    File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 129, in setup
      return distutils.core.setup(**attrs)
    File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
      self.run_command(cmd)
    File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 204, in run
      self.run_command('build')
    File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-nsa0ngpn/setup.py", line 24, in run
      build_csrc()
    File "/tmp/pip-req-build-nsa0ngpn/setup.py", line 18, in build_csrc
      subprocess.check_call(['cmake', os.path.join(base_path, 'csrc')], cwd=build_path)
    File "/usr/lib/python3.7/subprocess.py", line 347, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-req-build-nsa0ngpn/csrc']' returned non-zero exit status 1.
  ----------------------------------------
  ERROR: Failed building wheel for dirt
  Running setup.py clean for dirt
Failed to build dirt
Installing collected packages: dirt
  Running setup.py install for dirt: started
    Running setup.py install for dirt: finished with status 'error'
    ERROR: Complete output from command /usr/bin/python3.7 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-req-build-nsa0ngpn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-j8000oqc/install-record.txt --single-version-externally-managed --compile:
    ERROR: running install
    running build
    CMake Warning (dev) at /usr/local/share/cmake-3.14/Modules/FindOpenGL.cmake:275 (message):
      Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when
      available.  Run "cmake --help-policy CMP0072" for policy details.  Use the
      cmake_policy command to set the policy and suppress this warning.
    
      FindOpenGL found both a legacy GL library:
    
        OPENGL_gl_LIBRARY: /usr/lib/x86_64-linux-gnu/libGL.so
    
      and GLVND libraries for OpenGL and GLX:
    
        OPENGL_opengl_LIBRARY: /usr/lib/x86_64-linux-gnu/libOpenGL.so
        OPENGL_glx_LIBRARY: /usr/lib/x86_64-linux-gnu/libGLX.so
    
      OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for
      compatibility with CMake 3.10 and below the legacy GL library will be used.
    Call Stack (most recent call first):
      CMakeLists.txt:5 (find_package)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    CMake Error at CMakeLists.txt:26 (message):
      cannot find either cuda_launch_config.h or gpu_launch_config.h
    
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-req-build-nsa0ngpn/build/CMakeFiles/CMakeOutput.log".
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-nsa0ngpn/setup.py", line 50, in <module>
        'Programming Language :: Python :: 3.7',
      File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 129, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/usr/lib/python3.7/distutils/command/install.py", line 589, in run
        self.run_command('build')
      File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/tmp/pip-req-build-nsa0ngpn/setup.py", line 24, in run
        build_csrc()
      File "/tmp/pip-req-build-nsa0ngpn/setup.py", line 18, in build_csrc
        subprocess.check_call(['cmake', os.path.join(base_path, 'csrc')], cwd=build_path)
      File "/usr/lib/python3.7/subprocess.py", line 347, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-req-build-nsa0ngpn/csrc']' returned non-zero exit status 1.
    ----------------------------------------
ERROR: Command "/usr/bin/python3.7 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-req-build-nsa0ngpn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-j8000oqc/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-nsa0ngpn/

zhangxuan1918 avatar Jul 06 '19 06:07 zhangxuan1918

use TENSORFLOW_VERSION=2.0.0-beta1 and python2.7 resolve the issue. However the test is broken

Traceback (most recent call last):
  File "/root/dirt/tests/square_test.py", line 54, in <module>
    main()
  File "/root/dirt/tests/square_test.py", line 41, in main
    session = tf.Session()
AttributeError: 'module' object has no attribute 'Session'

zhangxuan1918 avatar Jul 06 '19 13:07 zhangxuan1918

@zhangxuan1918

Yes, their 2.0.0b0 release has a bug -- libtensorflow_framework is named wrongly.

I don't officially support 2.0 yet, which is why square_test.py fails. If you want to run the it, patch it within the container (delete lines 41-42 and change .eval() to .numpy() on lines 44-45).

I think it failed under python3 because your python3 tensorflow was installed to the wrong location. If you use this version of the dockerfile it should work as-is. If you want Ubuntu 18.04 rather than 16.04, just change line 4 accordingly, and add python3-distutils to the apt install on line 22.

pmh47 avatar Jul 06 '19 13:07 pmh47

I have the same error

ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3.5 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-h9gn0ti5/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-h9gn0ti5/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-y6jeexy4
       cwd: /tmp/pip-req-build-h9gn0ti5/
  Complete output (46 lines):
  /usr/lib/python3.5/distutils/dist.py:261: UserWarning: Unknown distribution option: 'python_requires'
    warnings.warn(msg)
  running bdist_wheel
  running build
  -- The CXX compiler identification is GNU 5.4.0
  -- The CUDA compiler identification is NVIDIA 9.0.176
  -- Check for working CXX compiler: /usr/bin/c++
  -- Check for working CXX compiler: /usr/bin/c++ -- works
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  -- Found OpenGL: /usr/local/lib/x86_64-linux-gnu/libOpenGL.so  found components:  OpenGL EGL
  CMake Error at CMakeLists.txt:26 (message):
    cannot find either cuda_launch_config.h or gpu_launch_config.h
  
  
  -- Configuring incomplete, errors occurred!
  See also "/tmp/pip-req-build-h9gn0ti5/build/CMakeFiles/CMakeOutput.log".
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-req-build-h9gn0ti5/setup.py", line 50, in <module>
      'Programming Language :: Python :: 3.7',
    File "/usr/lib/python3.5/distutils/core.py", line 148, in setup
      dist.run_commands()
    File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands
      self.run_command(cmd)
    File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 179, in run
      self.run_command('build')
    File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-h9gn0ti5/setup.py", line 24, in run
      build_csrc()
    File "/tmp/pip-req-build-h9gn0ti5/setup.py", line 18, in build_csrc
      subprocess.check_call(['cmake', os.path.join(base_path, 'csrc')], cwd=build_path)
    File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-req-build-h9gn0ti5/csrc']' returned non-zero exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for dirt
  Running setup.py clean for dirt
Failed to build dirt
Installing collected packages: dirt
    Running setup.py install for dirt ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3.5 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-h9gn0ti5/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-h9gn0ti5/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-gpb031_1/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.5/dirt
         cwd: /tmp/pip-req-build-h9gn0ti5/
    Complete output (35 lines):
    /usr/lib/python3.5/distutils/dist.py:261: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)
    running install
    running build
    CMake Error at CMakeLists.txt:26 (message):
      cannot find either cuda_launch_config.h or gpu_launch_config.h
    
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-req-build-h9gn0ti5/build/CMakeFiles/CMakeOutput.log".
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-h9gn0ti5/setup.py", line 50, in <module>
        'Programming Language :: Python :: 3.7',
      File "/usr/lib/python3.5/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/usr/lib/python3.5/distutils/command/install.py", line 583, in run
        self.run_command('build')
      File "/usr/lib/python3.5/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/pip-req-build-h9gn0ti5/setup.py", line 24, in run
        build_csrc()
      File "/tmp/pip-req-build-h9gn0ti5/setup.py", line 18, in build_csrc
        subprocess.check_call(['cmake', os.path.join(base_path, 'csrc')], cwd=build_path)
      File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-req-build-h9gn0ti5/csrc']' returned non-zero exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3.5 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-h9gn0ti5/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-h9gn0ti5/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-gpb031_1/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.5/dirt Check the logs for full command output.

The culprit if the error cuda_launch_config.h or gpu_launch_config.h.

I am running this in a docker container with CUDA_BASE_VERSION=9.0 UBUNTU_VERSION=16.04 CUDNN_VERSION=7.6.0.64 TENSORFLOW_VERSION=1.12.0 PYTHON=3.5

Also tried with tensorflow 1.14 and python 3.7, but same error.

Any advice?

tpatten avatar Sep 11 '20 12:09 tpatten