deepcut
deepcut copied to clipboard
Runtime error
Could you please let me know the issue with my demo?
error.txt ... I1016 22:46:16.365223 24943 net.cpp:816] Ignoring source layer loss_loc I1016 22:46:16.374922 24943 net.cpp:816] Ignoring source layer loss_next save dir /gpfs/work/f/fuf111/deepcut/data/mpii-multiperson/scoremaps/test testing from net file /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel deepcut: test (MPII multiperson test) 2/1758 F1016 22:46:17.488354 24943 syncedmem.cpp:136] Cannot use GPU in CPU-only Caffe: check mode. *** Check failure stack trace: ***
Hi, can you try changing this line https://github.com/eldar/deepcut/blob/master/lib/pose/cnn_cache_features.m#L47 to caffe.set_mode_cpu(); ? I always use GPU, but it never occured to me that people might not have GPUs with large enough memory, sorry!
It's actually very difficult to say from this log, what the error is. I've never seen anything like that. So how exactly did you build caffe? "After applying the solution from issue 1799" - what was this fix?
here https://github.com/eldar/deepcut-cnn/blob/9b5de9cb70a0a440311178f26fbd6984d81e5c54/models/finetune_flickr_style/solver.prototxt#L17, I uncommented the last line to solve the issue about "Cannot use GPU in CPU-only Caffe".
Actually I installed Caffe locally (without SUDO/ROOT access) on a Redhat-based cluster. I changed Makefile.config as follows based on my system config: CXXFLAGS += -std=c++11 CPU_ONLY := 1 BLAS := mkl
I commented the following part https://github.com/eldar/deepcut-cnn/blob/9b5de9cb70a0a440311178f26fbd6984d81e5c54/src/caffe/layers/softmax_loss_vec_layer.cpp#L236-L251 similar to softmax_loss_layer.cpp by myself.
I couldn't "make solver-callback" from your instructions, as there was no "solver-callback:" in Makefile!
Also I made your change "caffe.set_mode_cpu();" in https://github.com/eldar/deepcut/blob/master/lib/pose/cnn_cache_features.m#L47
"make solver-callback" - this will have to be executed not in the directory of caffe, but of directory of the solver.
Can you run the CNN-only demo as described here: https://github.com/eldar/deepcut-cnn/#installation-instructions adding the use_cpu flag like so:
python ./pose_demo.py image.png --out_name=prediction
This will ensure that you got the CNN running, at the very least.
After debugging, I could run "python ./pose_demo.py image.png --out_name=prediction". But "make solver-callback" gives the following log: [ 50%] Building CXX object CMakeFiles/solver-callback.dir/src/pose/research/solver-callback.cxx.o cc1plus: error: unrecognized command line option "-std=c++11" make[3]: *** [CMakeFiles/solver-callback.dir/src/pose/research/solver-callback.cxx.o] Error 1 make[2]: *** [CMakeFiles/solver-callback.dir/all] Error 2 make[1]: *** [CMakeFiles/solver-callback.dir/rule] Error 2 make: *** [solver-callback] Error 2
I used this command to solve the above error:
cmake . -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=c++ -DGUROBI_ROOT_DIR=/usr/global/gurobi/gurobi651/linux64 -DGUROBI_VERSION=65
GCC and GUROBI should be compatible in this case. Finally I made it on my system.
Segmentation fault after running the demo:
... I1020 11:20:43.944026 15336 net.cpp:228] conv1 does not need backward computation. I1020 11:20:43.944032 15336 net.cpp:270] This network produces output loc_pred I1020 11:20:43.944036 15336 net.cpp:270] This network produces output next_pred I1020 11:20:43.944042 15336 net.cpp:270] This network produces output prob I1020 11:20:43.944288 15336 net.cpp:283] Network initialization done. I1020 11:20:44.850095 15336 net.cpp:816] Ignoring source layer data I1020 11:20:44.850126 15336 net.cpp:816] Ignoring source layer label_data_1_split I1020 11:20:44.902542 15336 net.cpp:816] Ignoring source layer res4b4_up_pose I1020 11:20:44.902570 15336 net.cpp:816] Ignoring source layer crop_res4b4 I1020 11:20:44.902576 15336 net.cpp:816] Ignoring source layer loss_part_res4b4 I1020 11:20:44.902582 15336 net.cpp:816] Ignoring source layer res4b12_up_pose I1020 11:20:44.902587 15336 net.cpp:816] Ignoring source layer crop_res4b12 I1020 11:20:44.902593 15336 net.cpp:816] Ignoring source layer loss_part_res4b12 I1020 11:20:44.902909 15336 net.cpp:816] Ignoring source layer loss_part_res5c I1020 11:20:44.903682 15336 net.cpp:816] Ignoring source layer loss_loc I1020 11:20:44.912511 15336 net.cpp:816] Ignoring source layer loss_next save dir /gpfs/work/f/fuf111/deepcut/data/mpii-multiperson/scoremaps/test testing from net file /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel deepcut: test (MPII multiperson test) 2/1758 /usr/global/matlab/R2015a/bin/matlab: line 1: 15216 Segmentation fault pbs_taskset matlab-bin $@
Hey, I can't see from the log what exactly is the problem, but it could be that you didn't set the gurobi license file appropriately. This is where the location is set in the code https://github.com/eldar/deepcut/blob/master/lib/pose/exp_params.m#L18, you can modify it. You can obtain the academic license for free from Gurobi website.
P.S. In the next couple of days we will update the repository with completely new solver, that runs fast and also doesn't require any license.
Hi Eldar,
Thanks for your reply. Actually I did all the instructions as you posted in README.md as well as Gurobi license. I don't know Matlab version matters or not. But there is an error when I run ./start_matlab.sh as:
< M A T L A B (R) >
Copyright 1984-2015 The MathWorks, Inc.
R2015a (8.5.0.197613) 64-bit (glnxa64)
February 12, 2015
To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com.
Pose startup done
Academic License
Error using dbstop Not enough input arguments.
Can you modify start_matlab.sh script or just start it with this command instead?
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 matlab
Yes. I ran "dbstop if error" later inside Matlab, and the error is as follows:
... I1021 11:12:10.756536 2446 net.cpp:270] This network produces output next_pred I1021 11:12:10.756551 2446 net.cpp:270] This network produces output prob I1021 11:12:10.757047 2446 net.cpp:283] Network initialization done. Unexpected Standard exception from MEX file. What() is:basic_string::append ..
Error in caffe.Net/copy_from (line 123) caffe_('net_copy_from', self.hNet_self, weights_file);
Error in caffe.get_net (line 34) net.copy_from(weights_file);
Error in caffe.Net (line 31) self = caffe.get_net(varargin{:});
Error in cnn_cache_features (line 52) net = caffe.Net(net_def_file, net_bin_file, 'test');
Error in demo_multiperson (line 9) cnn_cache_features( experiment_index, 'test', image_index, 1);
123 caffe_('net_copy_from', self.hNet_self, weights_file);
Can you stop the debugger on this line:
Error in cnn_cache_features (line 52)
net = caffe.Net(net_def_file, net_bin_file, 'test');
and check if net_def_file points to existing model definition file (somewhere in
It seems fine! May it be related to copy a huge model file?
...
Cleared 0 solvers and 0 stand-alone nets 52 net = caffe.Net(net_def_file, net_bin_file, 'test');
K>> net_def_file net_def_file = /gpfs/work/f/fuf111/deepcut/models/ResNet-101-FCN_out_14_sigmoid_locreg_allpairs_test.prototxt
K>> net_bin_file net_bin_file = /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel
Sorry, it's quite difficult to say what's wrong without proper error log. The model definitely fits on a 12Gb GPU. Maybe the file was corrupted during download? Here's the hash for mine:
deepercut-models$ md5sum ResNet-101-mpii-multiperson.caffemodel
a1aa7fb45c4f1a0e90087d6ddac24cf1 ResNet-101-mpii-multiperson.caffemodel