ENAS-pytorch
ENAS-pytorch copied to clipboard
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
I encountered this strange error. Here is the output
$ python main.py
2020-10-17 06:19:37,971:INFO::[*] Make directories : logs/ptb_2020-10-17_06-19-37
2020-10-17 06:19:45,686:INFO::regularizing:
2020-10-17 06:19:56,858:INFO::# of parameters: 146,014,000
2020-10-17 06:19:57,208:INFO::[*] MODEL dir: logs/ptb_2020-10-17_06-19-37
2020-10-17 06:19:57,208:INFO::[*] PARAM path: logs/ptb_2020-10-17_06-19-37/params.json
/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
2020-10-17 06:19:57,872:INFO::max hidden 3.5992980003356934
2020-10-17 06:19:58,043:INFO::abs max grad 0
/home/ubuntu/ENAS-pytorch/trainer.py:323: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
self.args.shared_grad_clip)
2020-10-17 06:19:58,879:INFO::abs max grad 0.05615033581852913
2020-10-17 06:19:59,448:INFO::max hidden 9.425106048583984
2020-10-17 06:19:59,774:INFO::abs max grad 0.0575626865029335
2020-10-17 06:20:01,810:INFO::abs max grad 0.12187317758798599
2020-10-17 06:20:03,771:INFO::abs max grad 0.5459710359573364
2020-10-17 06:20:07,741:INFO::max hidden 15.914213180541992
2020-10-17 06:20:17,945:INFO::abs max grad 0.8663018941879272
2020-10-17 06:20:41,948:INFO::| epoch 0 | lr 20.00 | raw loss 8.39 | loss 8.39 | ppl 4402.23
2020-10-17 06:21:21,796:INFO::| epoch 0 | lr 20.00 | raw loss 7.20 | loss 7.20 | ppl 1343.73
2020-10-17 06:21:26,601:INFO::max hidden 20.534639358520508
2020-10-17 06:22:06,855:INFO::| epoch 0 | lr 20.00 | raw loss 7.00 | loss 7.00 | ppl 1093.28
2020-10-17 06:22:07,417:INFO::max hidden 22.71334457397461
2020-10-17 06:22:19,596:INFO::clipped 1 hidden states in one forward pass. max clipped hidden state norm: 25.37160301208496
Traceback (most recent call last):
File "main.py", line 54, in <module>
main(args)
File "main.py", line 34, in main
trnr.train()
File "/home/ubuntu/ENAS-pytorch/trainer.py", line 222, in train
self.train_shared(dag=dag)
File "/home/ubuntu/ENAS-pytorch/trainer.py", line 313, in train_shared
loss.backward()
File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 1000]], which is output 0 of AddBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
do you solve this problem? i meet the same issue.
When used torch 1.7.1, I meet this issue. I solved this issue by reducing the vision of torch.
According to this discussion, some autograd bugs exist in older versions which did not detect in-place operations that are not valid correctly.
After changing the in-place operations below, the code works fine for me when running RNN model and torch 1.8.0.
https://github.com/carpedm20/ENAS-pytorch/blob/master/models/shared_rnn.py#L248
clipped_num += 1
to clipped_num = clipped_num + 1
and
hidden *= torch.autograd.Variable(torch.FloatTensor(mask).cuda(), requires_grad=False)
to
hidden = hidden * torch.autograd.Variable(torch.FloatTensor(mask).cuda(), requires_grad=False)