optimize_image.py: Every unit returns 'Grad Exactly 0, failed'
I'm trying to visualize my grasp detection network, which is AlexNet adapted for regression. I've found that every unit I try to optimize returns with the 0 gradient error. The network works as intended. Here's the output:
python optimize_image.py --net-weights /home/joe/DeepGrasping/Models/caffeGraspTrainX_iter_10000.caffemodel --deploy-proto /home/joe/workspace/code/caffe/models/grasp/graspDeploy.prototxt --data-size 224,224 --push-layer conv1 --push-channel 1 --push-spatial 6,6 --decay 0.0001 --blur-radius 1.0 --blur-every 4 --lr-params "{'lr': 100.0}" --brave
I0526 14:49:57.373548 15181 layer_factory.hpp:77] Creating layer input I0526 14:49:57.373570 15181 net.cpp:91] Creating Layer input I0526 14:49:57.373580 15181 net.cpp:399] input -> data I0526 14:49:57.373613 15181 net.cpp:141] Setting up input I0526 14:49:57.373626 15181 net.cpp:148] Top shape: 1 3 224 224 (150528) I0526 14:49:57.373634 15181 net.cpp:156] Memory required for data: 602112 I0526 14:49:57.373643 15181 layer_factory.hpp:77] Creating layer conv1 I0526 14:49:57.373657 15181 net.cpp:91] Creating Layer conv1 I0526 14:49:57.373667 15181 net.cpp:425] conv1 <- data I0526 14:49:57.373677 15181 net.cpp:399] conv1 -> conv1 I0526 14:49:57.375252 15181 net.cpp:141] Setting up conv1 I0526 14:49:57.375278 15181 net.cpp:148] Top shape: 1 96 54 54 (279936) I0526 14:49:57.375286 15181 net.cpp:156] Memory required for data: 1721856 I0526 14:49:57.375305 15181 layer_factory.hpp:77] Creating layer relu1 I0526 14:49:57.375319 15181 net.cpp:91] Creating Layer relu1 I0526 14:49:57.375329 15181 net.cpp:425] relu1 <- conv1 I0526 14:49:57.375339 15181 net.cpp:386] relu1 -> conv1 (in-place) I0526 14:49:57.375356 15181 net.cpp:141] Setting up relu1 I0526 14:49:57.375366 15181 net.cpp:148] Top shape: 1 96 54 54 (279936) I0526 14:49:57.375375 15181 net.cpp:156] Memory required for data: 2841600 I0526 14:49:57.375382 15181 layer_factory.hpp:77] Creating layer norm1 I0526 14:49:57.375393 15181 net.cpp:91] Creating Layer norm1 I0526 14:49:57.375401 15181 net.cpp:425] norm1 <- conv1 I0526 14:49:57.375411 15181 net.cpp:399] norm1 -> norm1 I0526 14:49:57.375424 15181 net.cpp:141] Setting up norm1 I0526 14:49:57.375434 15181 net.cpp:148] Top shape: 1 96 54 54 (279936) I0526 14:49:57.375442 15181 net.cpp:156] Memory required for data: 3961344 I0526 14:49:57.375450 15181 layer_factory.hpp:77] Creating layer pool1 I0526 14:49:57.375461 15181 net.cpp:91] Creating Layer pool1 I0526 14:49:57.375469 15181 net.cpp:425] pool1 <- norm1 I0526 14:49:57.375479 15181 net.cpp:399] pool1 -> pool1 I0526 14:49:57.375497 15181 net.cpp:141] Setting up pool1 I0526 14:49:57.375507 15181 net.cpp:148] Top shape: 1 96 27 27 (69984) I0526 14:49:57.375514 15181 net.cpp:156] Memory required for data: 4241280 I0526 14:49:57.375522 15181 layer_factory.hpp:77] Creating layer conv2 I0526 14:49:57.375536 15181 net.cpp:91] Creating Layer conv2 I0526 14:49:57.375545 15181 net.cpp:425] conv2 <- pool1 I0526 14:49:57.375555 15181 net.cpp:399] conv2 -> conv2 I0526 14:49:57.385344 15181 net.cpp:141] Setting up conv2 I0526 14:49:57.385375 15181 net.cpp:148] Top shape: 1 256 27 27 (186624) I0526 14:49:57.385383 15181 net.cpp:156] Memory required for data: 4987776 I0526 14:49:57.385401 15181 layer_factory.hpp:77] Creating layer relu2 I0526 14:49:57.385416 15181 net.cpp:91] Creating Layer relu2 I0526 14:49:57.385426 15181 net.cpp:425] relu2 <- conv2 I0526 14:49:57.385437 15181 net.cpp:386] relu2 -> conv2 (in-place) I0526 14:49:57.385449 15181 net.cpp:141] Setting up relu2 I0526 14:49:57.385459 15181 net.cpp:148] Top shape: 1 256 27 27 (186624) I0526 14:49:57.385467 15181 net.cpp:156] Memory required for data: 5734272 I0526 14:49:57.385474 15181 layer_factory.hpp:77] Creating layer norm2 I0526 14:49:57.385485 15181 net.cpp:91] Creating Layer norm2 I0526 14:49:57.385493 15181 net.cpp:425] norm2 <- conv2 I0526 14:49:57.385504 15181 net.cpp:399] norm2 -> norm2 I0526 14:49:57.385517 15181 net.cpp:141] Setting up norm2 I0526 14:49:57.385526 15181 net.cpp:148] Top shape: 1 256 27 27 (186624) I0526 14:49:57.385535 15181 net.cpp:156] Memory required for data: 6480768 I0526 14:49:57.385541 15181 layer_factory.hpp:77] Creating layer pool2 I0526 14:49:57.385553 15181 net.cpp:91] Creating Layer pool2 I0526 14:49:57.385561 15181 net.cpp:425] pool2 <- norm2 I0526 14:49:57.385571 15181 net.cpp:399] pool2 -> pool2 I0526 14:49:57.385584 15181 net.cpp:141] Setting up pool2 I0526 14:49:57.385594 15181 net.cpp:148] Top shape: 1 256 13 13 (43264) I0526 14:49:57.385601 15181 net.cpp:156] Memory required for data: 6653824 I0526 14:49:57.385609 15181 layer_factory.hpp:77] Creating layer conv3 I0526 14:49:57.385625 15181 net.cpp:91] Creating Layer conv3 I0526 14:49:57.385634 15181 net.cpp:425] conv3 <- pool2 I0526 14:49:57.385644 15181 net.cpp:399] conv3 -> conv3 I0526 14:49:57.414319 15181 net.cpp:141] Setting up conv3 I0526 14:49:57.414361 15181 net.cpp:148] Top shape: 1 384 13 13 (64896) I0526 14:49:57.414374 15181 net.cpp:156] Memory required for data: 6913408 I0526 14:49:57.414404 15181 layer_factory.hpp:77] Creating layer relu3 I0526 14:49:57.414429 15181 net.cpp:91] Creating Layer relu3 I0526 14:49:57.414446 15181 net.cpp:425] relu3 <- conv3 I0526 14:49:57.414464 15181 net.cpp:386] relu3 -> conv3 (in-place) I0526 14:49:57.414485 15181 net.cpp:141] Setting up relu3 I0526 14:49:57.414507 15181 net.cpp:148] Top shape: 1 384 13 13 (64896) I0526 14:49:57.414520 15181 net.cpp:156] Memory required for data: 7172992 I0526 14:49:57.414535 15181 layer_factory.hpp:77] Creating layer conv4 I0526 14:49:57.414561 15181 net.cpp:91] Creating Layer conv4 I0526 14:49:57.414578 15181 net.cpp:425] conv4 <- conv3 I0526 14:49:57.414597 15181 net.cpp:399] conv4 -> conv4 I0526 14:49:57.435600 15181 net.cpp:141] Setting up conv4 I0526 14:49:57.435652 15181 net.cpp:148] Top shape: 1 384 13 13 (64896) I0526 14:49:57.435664 15181 net.cpp:156] Memory required for data: 7432576 I0526 14:49:57.435689 15181 layer_factory.hpp:77] Creating layer relu4 I0526 14:49:57.435714 15181 net.cpp:91] Creating Layer relu4 I0526 14:49:57.435731 15181 net.cpp:425] relu4 <- conv4 I0526 14:49:57.435748 15181 net.cpp:386] relu4 -> conv4 (in-place) I0526 14:49:57.435770 15181 net.cpp:141] Setting up relu4 I0526 14:49:57.435786 15181 net.cpp:148] Top shape: 1 384 13 13 (64896) I0526 14:49:57.435798 15181 net.cpp:156] Memory required for data: 7692160 I0526 14:49:57.435811 15181 layer_factory.hpp:77] Creating layer conv5 I0526 14:49:57.435837 15181 net.cpp:91] Creating Layer conv5 I0526 14:49:57.435853 15181 net.cpp:425] conv5 <- conv4 I0526 14:49:57.435870 15181 net.cpp:399] conv5 -> conv5 I0526 14:49:57.449239 15181 net.cpp:141] Setting up conv5 I0526 14:49:57.449275 15181 net.cpp:148] Top shape: 1 256 13 13 (43264) I0526 14:49:57.449287 15181 net.cpp:156] Memory required for data: 7865216 I0526 14:49:57.449317 15181 layer_factory.hpp:77] Creating layer relu5x I0526 14:49:57.449338 15181 net.cpp:91] Creating Layer relu5x I0526 14:49:57.449354 15181 net.cpp:425] relu5x <- conv5 I0526 14:49:57.449371 15181 net.cpp:386] relu5x -> conv5 (in-place) I0526 14:49:57.449391 15181 net.cpp:141] Setting up relu5x I0526 14:49:57.449407 15181 net.cpp:148] Top shape: 1 256 13 13 (43264) I0526 14:49:57.449419 15181 net.cpp:156] Memory required for data: 8038272 I0526 14:49:57.449432 15181 layer_factory.hpp:77] Creating layer pool5x I0526 14:49:57.449451 15181 net.cpp:91] Creating Layer pool5x I0526 14:49:57.449465 15181 net.cpp:425] pool5x <- conv5 I0526 14:49:57.449481 15181 net.cpp:399] pool5x -> pool5x I0526 14:49:57.449503 15181 net.cpp:141] Setting up pool5x I0526 14:49:57.449523 15181 net.cpp:148] Top shape: 1 256 6 6 (9216) I0526 14:49:57.449535 15181 net.cpp:156] Memory required for data: 8075136 I0526 14:49:57.449548 15181 layer_factory.hpp:77] Creating layer fc6x I0526 14:49:57.449581 15181 net.cpp:91] Creating Layer fc6x I0526 14:49:57.449594 15181 net.cpp:425] fc6x <- pool5x I0526 14:49:57.449612 15181 net.cpp:399] fc6x -> fc6x I0526 14:49:57.455247 15181 net.cpp:141] Setting up fc6x I0526 14:49:57.455301 15181 net.cpp:148] Top shape: 1 512 (512) I0526 14:49:57.455312 15181 net.cpp:156] Memory required for data: 8077184 I0526 14:49:57.455333 15181 layer_factory.hpp:77] Creating layer relu6x I0526 14:49:57.455354 15181 net.cpp:91] Creating Layer relu6x I0526 14:49:57.455375 15181 net.cpp:425] relu6x <- fc6x I0526 14:49:57.455396 15181 net.cpp:386] relu6x -> fc6x (in-place) I0526 14:49:57.455418 15181 net.cpp:141] Setting up relu6x I0526 14:49:57.455435 15181 net.cpp:148] Top shape: 1 512 (512) I0526 14:49:57.455447 15181 net.cpp:156] Memory required for data: 8079232 I0526 14:49:57.455461 15181 layer_factory.hpp:77] Creating layer drop6x I0526 14:49:57.455477 15181 net.cpp:91] Creating Layer drop6x I0526 14:49:57.455492 15181 net.cpp:425] drop6x <- fc6x I0526 14:49:57.455508 15181 net.cpp:386] drop6x -> fc6x (in-place) I0526 14:49:57.455528 15181 net.cpp:141] Setting up drop6x I0526 14:49:57.455544 15181 net.cpp:148] Top shape: 1 512 (512) I0526 14:49:57.455556 15181 net.cpp:156] Memory required for data: 8081280 I0526 14:49:57.455569 15181 layer_factory.hpp:77] Creating layer fc7x I0526 14:49:57.455587 15181 net.cpp:91] Creating Layer fc7x I0526 14:49:57.455600 15181 net.cpp:425] fc7x <- fc6x I0526 14:49:57.455620 15181 net.cpp:399] fc7x -> fc7x I0526 14:49:57.455997 15181 net.cpp:141] Setting up fc7x I0526 14:49:57.456015 15181 net.cpp:148] Top shape: 1 512 (512) I0526 14:49:57.456028 15181 net.cpp:156] Memory required for data: 8083328 I0526 14:49:57.456046 15181 layer_factory.hpp:77] Creating layer relu7x I0526 14:49:57.456064 15181 net.cpp:91] Creating Layer relu7x I0526 14:49:57.456079 15181 net.cpp:425] relu7x <- fc7x I0526 14:49:57.456094 15181 net.cpp:386] relu7x -> fc7x (in-place) I0526 14:49:57.456110 15181 net.cpp:141] Setting up relu7x I0526 14:49:57.456125 15181 net.cpp:148] Top shape: 1 512 (512) I0526 14:49:57.456135 15181 net.cpp:156] Memory required for data: 8085376 I0526 14:49:57.456142 15181 layer_factory.hpp:77] Creating layer drop7x I0526 14:49:57.456152 15181 net.cpp:91] Creating Layer drop7x I0526 14:49:57.456159 15181 net.cpp:425] drop7x <- fc7x I0526 14:49:57.456168 15181 net.cpp:386] drop7x -> fc7x (in-place) I0526 14:49:57.456178 15181 net.cpp:141] Setting up drop7x I0526 14:49:57.456187 15181 net.cpp:148] Top shape: 1 512 (512) I0526 14:49:57.456194 15181 net.cpp:156] Memory required for data: 8087424 I0526 14:49:57.456202 15181 layer_factory.hpp:77] Creating layer fc8x I0526 14:49:57.456212 15181 net.cpp:91] Creating Layer fc8x I0526 14:49:57.456218 15181 net.cpp:425] fc8x <- fc7x I0526 14:49:57.456229 15181 net.cpp:399] fc8x -> fc8x I0526 14:49:57.456254 15181 net.cpp:141] Setting up fc8x I0526 14:49:57.456270 15181 net.cpp:148] Top shape: 1 6 (6) I0526 14:49:57.456281 15181 net.cpp:156] Memory required for data: 8087448 I0526 14:49:57.456311 15181 net.cpp:219] fc8x does not need backward computation. I0526 14:49:57.456324 15181 net.cpp:219] drop7x does not need backward computation. I0526 14:49:57.456332 15181 net.cpp:219] relu7x does not need backward computation. I0526 14:49:57.456339 15181 net.cpp:219] fc7x does not need backward computation. I0526 14:49:57.456347 15181 net.cpp:219] drop6x does not need backward computation. I0526 14:49:57.456356 15181 net.cpp:219] relu6x does not need backward computation. I0526 14:49:57.456362 15181 net.cpp:219] fc6x does not need backward computation. I0526 14:49:57.456370 15181 net.cpp:219] pool5x does not need backward computation. I0526 14:49:57.456378 15181 net.cpp:219] relu5x does not need backward computation. I0526 14:49:57.456387 15181 net.cpp:219] conv5 does not need backward computation. I0526 14:49:57.456394 15181 net.cpp:219] relu4 does not need backward computation. I0526 14:49:57.456403 15181 net.cpp:219] conv4 does not need backward computation. I0526 14:49:57.456410 15181 net.cpp:219] relu3 does not need backward computation. I0526 14:49:57.456418 15181 net.cpp:219] conv3 does not need backward computation. I0526 14:49:57.456426 15181 net.cpp:219] pool2 does not need backward computation. I0526 14:49:57.456435 15181 net.cpp:219] norm2 does not need backward computation. I0526 14:49:57.456444 15181 net.cpp:219] relu2 does not need backward computation. I0526 14:49:57.456451 15181 net.cpp:219] conv2 does not need backward computation. I0526 14:49:57.456467 15181 net.cpp:219] pool1 does not need backward computation. I0526 14:49:57.456477 15181 net.cpp:219] norm1 does not need backward computation. I0526 14:49:57.456485 15181 net.cpp:219] relu1 does not need backward computation. I0526 14:49:57.456493 15181 net.cpp:219] conv1 does not need backward computation. I0526 14:49:57.456501 15181 net.cpp:219] input does not need backward computation. I0526 14:49:57.456508 15181 net.cpp:261] This network produces output fc8x I0526 14:49:57.456539 15181 net.cpp:274] Network initialization done. I0526 14:49:57.506186 15181 net.cpp:804] Ignoring source layer data I0526 14:49:57.512642 15181 net.cpp:804] Ignoring source layer loss
Starting optimization with the following parameters: FindParams: blur_every: 4 blur_radius: 1.0 decay: 0.0001 lr_params: {'lr': 100.0} lr_policy: constant max_iter: 500 push_channel: 1 push_dir: 1 push_layer: conv1 push_spatial: (6, 6) push_unit: (1, 6, 6) px_abs_benefit_percentile: 0 px_benefit_percentile: 0 rand_seed: 0 small_norm_percentile: 0 small_val_percentile: 0 start_at: mean_plus_rand
0 push unit: (1, 6, 6) with value 0 Max idx: (1, 10, 40) with value 47.5277 X: -48.5211765318 42.8585564122 3869.55458723 grad: 0.0 0.0 0.0 Grad exactly 0, failed Metaresult: grad 0 failure
Hmm... do you have force_backward: true in your prototxt? (I guess yes, because the script should be checking for this).
So the reason this particular unit is failing could be because the activation is exactly 0
push unit: (1, 6, 6) with value 0
which will produce a 0 gradient through the relu. However, other units are nonzero
Max idx: (1, 10, 40) with value 47.5277
so you could try opitmizing those? And maybe try optimizing a different spatial location: using (6,6) is more appropriate for the conv{3,4,5} layer, as it's the center unit of those layers. Your conv1 layer has spatial shape (54,54), so try optimizing the unit at (27,27) instead of the (6,6) unit nearer the edge.
Finally, if all else fails, put this right before the print 'Grad exactly 0, failed' line:
shownet(self.net)
assert False
It will show you the forward and backward activation and diff (min,max) values layer by layer. If you run with ipython --pdb -- ./optimize_image.py args..., you'll be dropped into the debugger at the assert False line, which will let you poke around with the network that produced the 0 grad.
Thanks for the swift, thorough response.
force_backward: true was not present! It now works, I think the check needs to be improved.