crnn
crnn copied to clipboard
Failed to create an image using the supplied Dockerfile
The base image is invalid。
I also found this problem? Had you solved it?
Change 2nd string in Dockerfile to FROM kaixhin/cuda-torch:8.0
and process will go a far further.
But it willl stop on attempt to build thrift 24.0
on make stage:
/usr/bin/python setup.py build
Traceback (most recent call last):
File "setup.py", line 39, in <module>
run_setup()
File "setup.py", line 36, in run_setup
zip_safe = False,
File "/usr/lib/python2.7/distutils/core.py", line 111, in setup
_setup_distribution = dist = klass(attrs)
File "/usr/local/lib/python2.7/dist-packages/setuptools/dist.py", line 321, in __init__
_Distribution.__init__(self, attrs)
File "/usr/lib/python2.7/distutils/dist.py", line 287, in __init__
self.finalize_options()
File "/usr/local/lib/python2.7/dist-packages/setuptools/dist.py", line 389, in finalize_options
ep.require(installer=self.fetch_build_egg)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2324, in require
items = working_set.resolve(reqs, env, installer, extras=self.extras)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 859, in resolve
raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.VersionConflict: (six 1.5.2 (/usr/lib/python2.7/dist-packages), Requirement.parse('six>=1.6.0'))
make[4]: *** [all-local] Error 1
make[4]: Leaving directory `/tmp/fblualib-build.eXaxOL/fbthrift/thrift/compiler/py'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/tmp/fblualib-build.eXaxOL/fbthrift/thrift/compiler'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/tmp/fblualib-build.eXaxOL/fbthrift/thrift/compiler'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/fblualib-build.eXaxOL/fbthrift/thrift'
make: *** [all] Error 2
The command '/bin/sh -c ./install_all.sh' returned a non-zero code: 2
So you need to place another string in Docker file: RUN pip install 'six==1.6.0' --force-reinstall
It should be placed between WORKDIR /root
and RUN chmod +x ./install_all.sh
My next problem is building TH++
:
Installing TH++
+ echo
+ echo 'Installing TH++'
+ echo
+ cd /tmp/fblualib-build.vE4rCz/thpp/thpp
+ '[' 0 -eq 0 ']'
+ mv /root/thpp_build.sh build.sh
+ chmod +x build.sh
+ ./build.sh
./install_all.sh: ./build.sh: /bin/bash: bad interpreter: Text file busy
The command '/bin/sh -c ./install_all.sh' returned a non-zero code: 126
So, I updated Dokerfile with sync
command:
1 # Start with a base docker image that contains torch and cutorch.
2 FROM kaixhin/cuda-torch:8.0
3
4 # Install fblualib and its dependencies :
5 ADD install_all.sh /root/install_all.sh
6 ADD thpp_build.sh /root/thpp_build.sh
7
8 WORKDIR /root
9 RUN pip install 'six==1.6.0' --force-reinstall
10 RUN chmod +x ./install_all.sh; sync;
11 RUN ./install_all.sh
12
13 # Clone the crnn repo :
14 RUN git clone https://github.com/bgshih/crnn.git
15 RUN apt-get update && apt-get install -y \
16 liblmdb-dev
17
18 WORKDIR /root/crnn/src
19 RUN chmod +x build_cpp.sh; sync;
20 RUN ./build_cpp.sh
And I've updated install_all.sh
:
139 echo
140 echo 'Installing TH++'
141 echo
142
143 cd $dir/thpp/thpp
144 if [ $current -eq 0 ]; then
145 mv /root/thpp_build.sh build.sh
146 chmod +x build.sh
147 sleep 1
148 fi
149 /bin/bash ./build.sh
150 #./build.sh
151
152 echo
153 echo 'Installing FBLuaLib'
154 echo
155
156 cd $dir/fblualib/fblualib
157 /bin/bash ./build.sh
158 #./build.sh
159
160 echo
161 echo 'All done!'
162 echo
But now I got error during TH++
building, and it looks very similar to error which I got, when tried to built network without Docker using. Ill paste only its tail:
In file included from thpp/detail/TensorGeneric.h:1:0,
from /root/torch/install/include/TH/THGenerateIntTypes.h:14,
from /root/torch/install/include/TH/THGenerateAllTypes.h:11,
from /tmp/fblualib-build.IBQKAm/thpp/thpp/../thpp/detail/Tensor.h:28,
from /tmp/fblualib-build.IBQKAm/thpp/thpp/../thpp/Tensor.h:19,
from /tmp/fblualib-build.IBQKAm/thpp/thpp/TensorSerialization.cpp:11:
/tmp/fblualib-build.IBQKAm/thpp/thpp/../thpp/detail/TensorGeneric.h:201:37: error: return-statement with a value, in function returning 'void' [-fpermissive]
return THTensor_(prod)(r, t, dim);
^
make[2]: *** [CMakeFiles/thpp.dir/TensorSerialization.cpp.o] Error 1
make[1]: *** [CMakeFiles/thpp.dir/all] Error 2
make: *** [all] Error 2
The command '/bin/sh -c ./install_all.sh' returned a non-zero code: 2
Full listing: https://pastebin.com/Rvpj0c2v
It looks like a Torch7 problem: https://github.com/facebook/thpp/pull/42
the torch7 had Add a keepdim parameter for reduction functions over a single dimension.
For those who met same problem force reinstalling six with
RUN pip install 'six==1.6.0' --force-reinstall
I had a few try and finally make work replacing it with
RUN pip install --ignore-installed six==1.6.0
per
https://github.com/pypa/pip/issues/3165
Hope it might help someone.