densecap
densecap copied to clipboard
cuda runtime error : too many resources requested for launch
When trainning the model with "train.lua", there are some problrm:
Processed image 122.jpg (4 / 1000) of split 1, detected 191 regions THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-2734/cutorch/lib/THC/generated/../generic/THCTensorSort.cu line=153 error=7 : too many resources requested for launch /home/.../torch/install/bin/luajit: /home/.../torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential:
And I use the Nvidia K80
I have not seen that error before - does it happen consistently or was it a one-time error?
Were there other jobs running on the same GPU at the same time?
When I use "th train.lua -debug_max_train_images 3” , it can train well , with number smaller than 4.
I have the same error but I am training on my own data. and the error is persistent after few iterations of training it fails like that iter 82: mid_box_reg_loss: 0.004, captioning_loss: 22.440, end_objectness_loss: 0.012, mid_objectness_loss: 0.056, end_box_reg_loss: 0.003, [total: 22.516] iter 83: mid_box_reg_loss: 0.004, captioning_loss: 24.903, end_objectness_loss: 0.011, mid_objectness_loss: 0.069, end_box_reg_loss: 0.003, [total: 24.990] iter 84: mid_box_reg_loss: 0.001, captioning_loss: 18.798, end_objectness_loss: 0.003, mid_objectness_loss: 0.077, end_box_reg_loss: 0.003, [total: 18.883] iter 85: mid_box_reg_loss: 0.004, captioning_loss: 22.840, end_objectness_loss: 0.020, mid_objectness_loss: 0.061, end_box_reg_loss: 0.004, [total: 22.928] iter 86: mid_box_reg_loss: 0.003, captioning_loss: 15.013, end_objectness_loss: 0.014, mid_objectness_loss: 0.059, end_box_reg_loss: 0.004, [total: 15.092] THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9137/cutorch/lib/THC/generated/../generic/THCTensorSort.cu line=153 error=7 : too many resources requested for launch /home/nady/torch/install/bin/luajit: /home/nady/torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential: In 2 module of nn.ParallelTable: /home/nady/torch/install/share/lua/5.1/nn/THNN.lua:110: cuda runtime error (7) : too many resources requested for launch at /tmp/luarocks_cutorch-scm-1-9137/cutorch/lib/THC/generated/../generic/THCTensorSort.cu:153 stack traceback: [C]: in function 'v' /home/nady/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'LookupTable_accGradParameters' /home/nady/torch/install/share/lua/5.1/nn/LookupTable.lua:105: in function </home/nady/torch/install/share/lua/5.1/nn/LookupTable.lua:96> [C]: in function 'xpcall' /home/nady/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/nady/torch/install/share/lua/5.1/nn/ParallelTable.lua:27: in function 'accGradParameters' /home/nady/torch/install/share/lua/5.1/nn/Module.lua:32: in function </home/nady/torch/install/share/lua/5.1/nn/Module.lua:29> [C]: in function 'xpcall' /home/nady/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' /home/nady/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function 'backward' ... /home/nady/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'neteval' /home/nady/torch/install/share/lua/5.1/nngraph/gmodule.lua:454: in function 'updateGradInput' /home/nady/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward' ./densecap/DenseCapModel.lua:353: in function 'backward' ./densecap/DenseCapModel.lua:471: in function 'forward_backward' train_char.lua:90: in function 'lossFun' train_char.lua:120: in main chunk
this is fixed if you use the latest cunn, i believe: luarocks install cunn But dont count me on it.
cunn did not compile for me. I could solve the problem by luarocks install torch luarocks install cutorch luarocks install cunn