Info-HCVAE icon indicating copy to clipboard operation
Info-HCVAE copied to clipboard

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Open buptgxt opened this issue 4 years ago • 1 comments

Traceback (most recent call last): File "main.py", line 116, in main(args) File "main.py", line 38, in main trainer.train(c_ids, q_ids, a_ids, start_positions, end_positions) File "/home/2018/Info-HCVAE-master/vae/trainer.py", line 35, in train loss.backward() File "/home/2018/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/tensor.py", line 150, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/2018/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

cuda:9.0 cudnn:7 python3.6 pytorch1.3

thank you for your work! I am trying to train the model but I’ve got an error that is “RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED”. I got this problem when this part of code runs: loss.backward() Can you help me to solve it?

buptgxt avatar Nov 13 '20 08:11 buptgxt

I think it might be due to the GPU memory issue.

Try using smaller batch size.

seanie12 avatar Feb 05 '21 07:02 seanie12