crnn icon indicating copy to clipboard operation
crnn copied to clipboard

Failed to create an image using the supplied Dockerfile

Open huizhang0110 opened this issue 6 years ago • 5 comments

screenshot from 2017-12-24 16-11-32

The base image is invalid。

huizhang0110 avatar Dec 24 '17 08:12 huizhang0110

I also found this problem? Had you solved it?

Jayhello avatar Jan 16 '18 07:01 Jayhello

Change 2nd string in Dockerfile to FROM kaixhin/cuda-torch:8.0 and process will go a far further.

But it willl stop on attempt to build thrift 24.0 on make stage:

/usr/bin/python setup.py build
Traceback (most recent call last):
  File "setup.py", line 39, in <module>
    run_setup()
  File "setup.py", line 36, in run_setup
    zip_safe = False,
  File "/usr/lib/python2.7/distutils/core.py", line 111, in setup
    _setup_distribution = dist = klass(attrs)
  File "/usr/local/lib/python2.7/dist-packages/setuptools/dist.py", line 321, in __init__
    _Distribution.__init__(self, attrs)
  File "/usr/lib/python2.7/distutils/dist.py", line 287, in __init__
    self.finalize_options()
  File "/usr/local/lib/python2.7/dist-packages/setuptools/dist.py", line 389, in finalize_options
    ep.require(installer=self.fetch_build_egg)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2324, in require
    items = working_set.resolve(reqs, env, installer, extras=self.extras)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 859, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.VersionConflict: (six 1.5.2 (/usr/lib/python2.7/dist-packages), Requirement.parse('six>=1.6.0'))
make[4]: *** [all-local] Error 1
make[4]: Leaving directory `/tmp/fblualib-build.eXaxOL/fbthrift/thrift/compiler/py'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/tmp/fblualib-build.eXaxOL/fbthrift/thrift/compiler'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/tmp/fblualib-build.eXaxOL/fbthrift/thrift/compiler'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/fblualib-build.eXaxOL/fbthrift/thrift'
make: *** [all] Error 2
The command '/bin/sh -c ./install_all.sh' returned a non-zero code: 2

So you need to place another string in Docker file: RUN pip install 'six==1.6.0' --force-reinstall

It should be placed between WORKDIR /root and RUN chmod +x ./install_all.sh

My next problem is building TH++:

Installing TH++

+ echo
+ echo 'Installing TH++'
+ echo
+ cd /tmp/fblualib-build.vE4rCz/thpp/thpp
+ '[' 0 -eq 0 ']'
+ mv /root/thpp_build.sh build.sh
+ chmod +x build.sh
+ ./build.sh
./install_all.sh: ./build.sh: /bin/bash: bad interpreter: Text file busy
The command '/bin/sh -c ./install_all.sh' returned a non-zero code: 126

banderlog avatar Aug 03 '18 16:08 banderlog

So, I updated Dokerfile with sync command:

  1 # Start with a base docker image that contains torch and cutorch.
  2 FROM kaixhin/cuda-torch:8.0
  3 
  4 # Install fblualib and its dependencies :
  5 ADD install_all.sh /root/install_all.sh
  6 ADD thpp_build.sh /root/thpp_build.sh
  7 
  8 WORKDIR /root
  9 RUN pip install 'six==1.6.0' --force-reinstall
 10 RUN chmod +x ./install_all.sh; sync;
 11 RUN ./install_all.sh
 12 
 13 # Clone the crnn repo :
 14 RUN git clone https://github.com/bgshih/crnn.git
 15 RUN apt-get update && apt-get install -y \
 16         liblmdb-dev
 17 
 18 WORKDIR /root/crnn/src
 19 RUN chmod +x build_cpp.sh; sync;
 20 RUN ./build_cpp.sh

And I've updated install_all.sh:

139 echo
140 echo 'Installing TH++'
141 echo
142 
143 cd $dir/thpp/thpp
144 if [ $current -eq 0 ]; then
145   mv /root/thpp_build.sh build.sh
146   chmod +x build.sh
147   sleep 1
148 fi
149 /bin/bash ./build.sh
150 #./build.sh
151 
152 echo
153 echo 'Installing FBLuaLib'
154 echo
155 
156 cd $dir/fblualib/fblualib
157 /bin/bash ./build.sh
158 #./build.sh
159 
160 echo
161 echo 'All done!'
162 echo

But now I got error during TH++ building, and it looks very similar to error which I got, when tried to built network without Docker using. Ill paste only its tail:

In file included from thpp/detail/TensorGeneric.h:1:0,
                 from /root/torch/install/include/TH/THGenerateIntTypes.h:14,
                 from /root/torch/install/include/TH/THGenerateAllTypes.h:11,
                 from /tmp/fblualib-build.IBQKAm/thpp/thpp/../thpp/detail/Tensor.h:28,
                 from /tmp/fblualib-build.IBQKAm/thpp/thpp/../thpp/Tensor.h:19,
                 from /tmp/fblualib-build.IBQKAm/thpp/thpp/TensorSerialization.cpp:11:
/tmp/fblualib-build.IBQKAm/thpp/thpp/../thpp/detail/TensorGeneric.h:201:37: error: return-statement with a value, in function returning 'void' [-fpermissive]
     return THTensor_(prod)(r, t, dim);
                                     ^
make[2]: *** [CMakeFiles/thpp.dir/TensorSerialization.cpp.o] Error 1
make[1]: *** [CMakeFiles/thpp.dir/all] Error 2
make: *** [all] Error 2
The command '/bin/sh -c ./install_all.sh' returned a non-zero code: 2

Full listing: https://pastebin.com/Rvpj0c2v

banderlog avatar Aug 03 '18 17:08 banderlog

It looks like a Torch7 problem: https://github.com/facebook/thpp/pull/42

the torch7 had Add a keepdim parameter for reduction functions over a single dimension.

banderlog avatar Aug 03 '18 18:08 banderlog

For those who met same problem force reinstalling six with RUN pip install 'six==1.6.0' --force-reinstall I had a few try and finally make work replacing it with RUN pip install --ignore-installed six==1.6.0 per https://github.com/pypa/pip/issues/3165 Hope it might help someone.

peacherwu avatar Apr 11 '19 16:04 peacherwu