Pytorch-Deeplab icon indicating copy to clipboard operation
Pytorch-Deeplab copied to clipboard

CuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Open J-zin opened this issue 6 years ago • 5 comments

hi, i use it to train for my own data set, but it run error, can u give me some suggestion? (my segmentation class img is gray image).

/opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [190,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [3,0,0], thread: [191,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "train.py", line 234, in main() File "train.py", line 215, in main loss.backward() File "/home/zhangjunyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/zhangjunyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: CuDNN error: CUDNN_STATUS_INTERNAL_ERROR

J-zin avatar Oct 31 '18 07:10 J-zin

i got it, the gray num must be lower than the class nums!!!

J-zin avatar Oct 31 '18 08:10 J-zin

i got it, the gray num must be lower than the class nums!!!

I got the same problem. Could you please teach me how to fix it?

Raven1327 avatar Jan 04 '19 08:01 Raven1327

@Raven1327 Do you run this repo on PASCAL VOC dataset or another dataset? You can simply modify the NUM_CLASSES in this line for your dataset.

speedinghzl avatar Jan 07 '19 04:01 speedinghzl

@speedinghzl Thanks for your help. It can run now. I made some mistakes. T-T

Raven1327 avatar Jan 08 '19 10:01 Raven1327

@speedinghzl I use VOC dataset, NUM_CLASSED=21, i also get the same problem

Lixy1997 avatar Jul 23 '20 11:07 Lixy1997