Fewshot_Detection icon indicating copy to clipboard operation
Fewshot_Detection copied to clipboard

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

Open wangxiaoshuai223 opened this issue 4 years ago • 2 comments

Sorry for troubling you. When I run python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23,a runtimeerror occured: Traceback (most recent call last): File "train_meta.py", line 325, in train(epoch) File "train_meta.py", line 218, in train output = model(data, metax, mask) File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/m/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/m/Fewshot_Detection-master/darknet_meta.py", line 199, in forward dynamic_weights = self.meta_forward(metax, mask) File "/home/m/Fewshot_Detection-master/darknet_meta.py", line 122, in meta_forward metax = model(metax) File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/m/.local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 282, in forward self.padding, self.dilation, self.groups) File "/home/m/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 90, in conv2d return f(input, weight, bias) RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

wangxiaoshuai223 avatar Apr 29 '20 14:04 wangxiaoshuai223

same as me, and when I set torch.backends.cudnn.enabled = False in the front of train_meta.py, it return another error at the same place in the code, as follwing. Have you solved this error, if so, can you provide me some advises? Thanks a lot.

File "/home/sun/projects/Fewshot_Detection/darknet_meta.py", line 199, in forward dynamic_weights = self.meta_forward(metax, mask) File "/home/sun/projects/Fewshot_Detection/darknet_meta.py", line 122, in meta_forward metax = model(metax) File "/home/sun/anaconda3/envs/FR/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/sun/anaconda3/envs/FR/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/home/sun/anaconda3/envs/FR/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/sun/anaconda3/envs/FR/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 282, in forward self.padding, self.dilation, self.groups) File "/home/sun/anaconda3/envs/FR/lib/python2.7/site-packages/torch/nn/functional.py", line 90, in conv2d return f(input, weight, bias) RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1518238441757/work/torch/lib/THC/THCBlas.cu:247

whsun21 avatar Jul 09 '20 02:07 whsun21

minimize the batch size of .cfg file

linsongxue avatar Jun 07 '21 03:06 linsongxue