MSRF-Net_PyTorch
MSRF-Net_PyTorch copied to clipboard
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED when loss.backward() in train.py
Thanks for your great work, your code is so much cleaner that I could easily understand. I just had an error raised in train.py when loss.backward(). The error is [RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED. Have you ever seen this before and do you have any suggestion to fix this? Thanks a lot!
Hi @chiendoanngoc ! You're welcome! I've faced the same issue and I fixed that by using another version of PyTorch. Actually I'm using version: 1.9.0+cu111 however it depends on your CUDA version. You can find all previous pytorch versions here
I just changed the README file to avoid confusion about the Pytorch version.
HELLO, @amlarraz @chiendoanngoc I had changed the torch version to 1.9.0+cu111 but I still got the same error. I used Colab as working environment.
cpuset_checked))
Logdir: ./logs/combination-2_7_2022-18h40m33s
Train epoch: 1: 0%| | 0/1113 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-1-13d8a8766d4b>](https://localhost:8080/#) in <module>()
60 loss = criterion(pred_3, pred_canny, pred_1, pred_2, msk, canny_label)
61 loss = loss/accumulation_steps
---> 62 loss.backward()
63 # accumulative gradient
64 if (i + 1) % accumulation_steps == 0: # Wait for several backward steps
1 frames
[/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py](https://localhost:8080/#) in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable._execution_engine.run_backward(
148 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED```
import torch torch.version
1.9.0+cu111
Hi @Twixii99, which CUDA version are you using? Remember that the PyTorch version depends on the CUDA version you're using. Ifyou're using this PyTorch version and the colab enviroment is using a different CUDA version than 11.1 PyTorch will give you some errors. To know which CUDA version you're using you can run the command: !nvidia-smi
in one cell. To choose the correct PyTorch version according with your CUDA version you can visit this page.