Deep-Feature-Flow icon indicating copy to clipboard operation
Deep-Feature-Flow copied to clipboard

too large launch parameter: AddTakeGrad[130560,1], [64,1,1]

Open KevinQian97 opened this issue 5 years ago • 0 comments

Hi, I met a strange problem when setting the scale of the image from your [1000,600] to [1920, 1080] and get the error report below:

Traceback (most recent call last): File "dff_rfcn/train_end2end.py", line 179, in main() File "dff_rfcn/train_end2end.py", line 176, in main config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step) File "dff_rfcn/train_end2end.py", line 169, in train_net arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch) File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 981, in fit self.update_metric(eval_metric, data_batch.label) File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 1073, in update_metric self._curr_module.update_metric(eval_metric, labels) File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 674, in update_metric self._exec_group.update_metric(eval_metric, labels) File "/Deep-Feature-Flow/dff_rfcn/core/DataParallelExecutorGroup.py", line 481, in update_metric eval_metric.update(labels, texec.outputs) File "/usr/local/lib/python2.7/dist-packages/mxnet/metric.py", line 318, in update metric.update(labels, preds) File "/Deep-Feature-Flow/dff_rfcn/core/metric.py", line 51, in update pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32') File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 1980, in asnumpy ctypes.c_size_t(data.size))) File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 252, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [01:12:23] /work/mxnet/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: AddTakeGrad[130560,1], [64,1,1]

Stack trace returned 10 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4114ba) [0x7f89e3d464ba] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x411ad1) [0x7f89e3d46ad1] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4b7eddb) [0x7f89e84b3ddb] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4e12eec) [0x7f89e8747eec] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4e132cf) [0x7f89e87482cf] [bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4f92da4) [0x7f89e88c7da4] [bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2ceb179) [0x7f89e6620179] [bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cf1e67) [0x7f89e6626e67] [bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cd01c4) [0x7f89e66051c4] [bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cd44b3) [0x7f89e66094b3]

I tried to locate the problem and found it happened in /Deep-Feature-Flow/dff_rfcn/core/metric.py line46 pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32') if you delete asnumpy() then the problem get solved but the whole program will run much slower and many other codes need to be changed. Would you mind telling me why this error happened and if there are other methods to fix the problem?

Thanks for your help!

KevinQian97 avatar Jul 07 '19 05:07 KevinQian97