cudnn.torch
cudnn.torch copied to clipboard
cudnnConvolutionBackwardData failed - Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardData)
I'm not sure what is causing this error, and how to fix it:
cudnnConvolutionBackwardData failed: 9 convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA1,3,2615,2816 -filtA64,3,3,3 1,64,2615,2816 -padA1,1 -convStrideA1,1 CUDNN_DATA_FLOAT
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
/home/ubuntu/torch/install/share/lua/5.1/cudnn/find.lua:94: Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardData)
stack traceback:
[C]: in function 'error'
/home/ubuntu/torch/install/share/lua/5.1/cudnn/find.lua:94: in function 'checkedCall'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:212: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:201>
[C]: in function 'xpcall'
/home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function </home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:50>
[C]: in function 'pcall'
/home/ubuntu/torch/install/share/lua/5.1/cutorch/init.lua:32: in function 'withDevice'
/home/ubuntu/torch/install/share/lua/5.1/nn/GPU.lua:112: in function </home/ubuntu/torch/install/share/lua/5.1/nn/GPU.lua:108>
[C]: in function 'xpcall'
/home/ubuntu/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:58: in function 'updateGradInput'
neural_style.lua:284: in function 'opfunc'
/home/ubuntu/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
neural_style.lua:307: in function 'main'
neural_style.lua:601: in main chunk
[C]: in function 'dofile'
...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
I have been trying to push things as far as they can go, and may have hit a limit in Torch7 and/or cuDNN, because search engines don't really show anything for this error.
I was running the latest version of Torch, Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-1038-aws x86_64), and Cuda 9.0, with cuDNN v7.
I assume this error is because of a limitation in the maximum value possible? So this maximum could be changed?
The error appears to come from these areas:
In SpatialConvolution.lua, on line 201: https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua#L201
In SpatialConvolution.lua, on line 209: https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua#L209
@soumith How do I fix this limitation?
Related Issues:
https://github.com/jzbontar/mc-cnn/issues/16
https://github.com/allenai/XNOR-Net/issues/22
https://github.com/soumith/dcgan.torch/issues/67
https://github.com/facebook/fb.resnet.torch/issues/153
After using cudnn.verbose = true, it seems that it may be a lack of memory issue after all:
https://gist.github.com/ProGamerGov/9e5b367a90cd4be9cbd1ed023dafbb81
I thought I could go a lot higher in terms of image size in Neural-Style, but I did that one the install with an earlier version of Torch and Cuda/cuDNN. Either Torch7 or Cuda/cuDNN has gotten more inefficient, and that is probably why I can't get any higher in terms of image size: https://github.com/jcjohnson/neural-style/issues/429
Try limiting your workspace size by setting cudnn.maxWorkspaceGPUMemPercent (say, to 30 or 40)
Hi guys, I was wondering if any of you has any progress on this problem. I have a similar error with cudnnConvolutionBackwardFilter. See below for the full error message,
cudnnConvolutionBackwardFilter failed: 9 convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA93700,3,20,9 -filtA10,3,9,9 93700,10,12,1 -padA0,0 -convStrideA1,1 CUDNN_DATA_FLOAT /usr/local/mnt/vega_scratch/scratch/bio_vad/src/torch/install/bin/luajit: ...bio_vad/src/torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential: In 2 module of nn.Sequential: ...h/bio_vad/src/torch/install/share/lua/5.1/cudnn/find.lua:94: Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardFilter) stack traceback: [C]: in function 'error' ...h/bio_vad/src/torch/install/share/lua/5.1/cudnn/find.lua:94: in function 'checkedCall' ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:264: in function 'accGradParameters' ...ch/bio_vad/src/torch/install/share/lua/5.1/nn/Module.lua:32: in function <...ch/bio_vad/src/torch/install/share/lua/5.1/nn/Module.lua:29> [C]: in function 'xpcall' ...bio_vad/src/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' ...io_vad/src/torch/install/share/lua/5.1/nn/Sequential.lua:87: in function <...io_vad/src/torch/install/share/lua/5.1/nn/Sequential.lua:81> [C]: in function 'xpcall' ...bio_vad/src/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' ...io_vad/src/torch/install/share/lua/5.1/nn/Sequential.lua:91: in function 'backward' ...ai/code/CLVTtorch/CLVT_SSF_Trainer/train_noSequencer.lua:106: in function 'opfunc' ...o_vad/src/torch/install/share/lua/5.1/optim/adadelta.lua:31: in function 'optimMethod' ...ai/code/CLVTtorch/CLVT_SSF_Trainer/train_noSequencer.lua:212: in main chunk [C]: in function 'dofile' ...ode/CLVTtorch/CLVT_SSF_Trainer/trainCLVT_noSequencer.lua:124: in main chunk [C]: in function 'dofile' .../src/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x004064f0
Is this a memory issue?
Cheers
@ProGamerGov Do you have solved this problem?