ENAS-pytorch icon indicating copy to clipboard operation
ENAS-pytorch copied to clipboard

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Open dangne opened this issue 3 years ago • 3 comments

I encountered this strange error. Here is the output

$ python main.py 
2020-10-17 06:19:37,971:INFO::[*] Make directories : logs/ptb_2020-10-17_06-19-37
2020-10-17 06:19:45,686:INFO::regularizing:
2020-10-17 06:19:56,858:INFO::# of parameters: 146,014,000
2020-10-17 06:19:57,208:INFO::[*] MODEL dir: logs/ptb_2020-10-17_06-19-37
2020-10-17 06:19:57,208:INFO::[*] PARAM path: logs/ptb_2020-10-17_06-19-37/params.json
/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1614: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1625: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
2020-10-17 06:19:57,872:INFO::max hidden 3.5992980003356934
2020-10-17 06:19:58,043:INFO::abs max grad 0
/home/ubuntu/ENAS-pytorch/trainer.py:323: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  self.args.shared_grad_clip)
2020-10-17 06:19:58,879:INFO::abs max grad 0.05615033581852913
2020-10-17 06:19:59,448:INFO::max hidden 9.425106048583984
2020-10-17 06:19:59,774:INFO::abs max grad 0.0575626865029335
2020-10-17 06:20:01,810:INFO::abs max grad 0.12187317758798599
2020-10-17 06:20:03,771:INFO::abs max grad 0.5459710359573364
2020-10-17 06:20:07,741:INFO::max hidden 15.914213180541992
2020-10-17 06:20:17,945:INFO::abs max grad 0.8663018941879272
2020-10-17 06:20:41,948:INFO::| epoch   0 | lr 20.00 | raw loss 8.39 | loss 8.39 | ppl  4402.23
2020-10-17 06:21:21,796:INFO::| epoch   0 | lr 20.00 | raw loss 7.20 | loss 7.20 | ppl  1343.73
2020-10-17 06:21:26,601:INFO::max hidden 20.534639358520508
2020-10-17 06:22:06,855:INFO::| epoch   0 | lr 20.00 | raw loss 7.00 | loss 7.00 | ppl  1093.28
2020-10-17 06:22:07,417:INFO::max hidden 22.71334457397461
2020-10-17 06:22:19,596:INFO::clipped 1 hidden states in one forward pass. max clipped hidden state norm: 25.37160301208496
Traceback (most recent call last):
  File "main.py", line 54, in <module>
    main(args)
  File "main.py", line 34, in main
    trnr.train()
  File "/home/ubuntu/ENAS-pytorch/trainer.py", line 222, in train
    self.train_shared(dag=dag)
  File "/home/ubuntu/ENAS-pytorch/trainer.py", line 313, in train_shared
    loss.backward()
  File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/ubuntu/anaconda3/envs/enas-pytorch/lib/python3.6/site-packages/torch/autograd/__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 1000]], which is output 0 of AddBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

dangne avatar Oct 17 '20 06:10 dangne

do you solve this problem? i meet the same issue.

lyjzsyzlt avatar Dec 14 '20 07:12 lyjzsyzlt

When used torch 1.7.1, I meet this issue. I solved this issue by reducing the vision of torch.

STONEKONG avatar Jan 12 '21 02:01 STONEKONG

According to this discussion, some autograd bugs exist in older versions which did not detect in-place operations that are not valid correctly.

After changing the in-place operations below, the code works fine for me when running RNN model and torch 1.8.0.

https://github.com/carpedm20/ENAS-pytorch/blob/master/models/shared_rnn.py#L248

clipped_num += 1 to clipped_num = clipped_num + 1

and

hidden *= torch.autograd.Variable(torch.FloatTensor(mask).cuda(), requires_grad=False) to hidden = hidden * torch.autograd.Variable(torch.FloatTensor(mask).cuda(), requires_grad=False)

david90103 avatar Feb 20 '21 14:02 david90103