After modifying this line to include "retain_graph":
D_loss.backward(retain_graph=True)
I now keep getting this error:
python .\main_Digits.py
Pre-train wae
Loading MNIST dataset.
Loading MNIST dataset.
Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error:
File ".\main_Digits.py", line 321, in
main()
File ".\main_Digits.py", line 83, in main
train(model, exp_name, kwargs)
File ".\main_Digits.py", line 97, in train
wae_train(wae, discriminator, train_loader, wae_optimizer, d_optimizer, epoch)
File ".\main_Digits.py", line 290, in wae_train
D_z_tilde = D(z_tilde.clone())
File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\admin\Desktop\MPhil\Domain_Generalization\models\ada_conv.py", line 66, in forward
return self.net(z)
File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
input = module(input)
File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
(print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:60)
Traceback (most recent call last):
File ".\main_Digits.py", line 321, in
main()
File ".\main_Digits.py", line 83, in main
train(model, exp_name, kwargs)
File ".\main_Digits.py", line 97, in train
wae_train(wae, discriminator, train_loader, wae_optimizer, d_optimizer, epoch)
File ".\main_Digits.py", line 306, in wae_train
loss.backward() # No need to retain the graph here if this is the final use of it
File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\autograd_init_.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
You might have to shuffle the d_optimzer until after loss.backward()
has been called in the wae training loop. This was running on an older version of pytorch where the error was not thrown, but the gradients calculated based on the sequence of operations in the existing code is not correct.
Basically, it seems the model parameters are updated, after d_optimiser.step()
has been called. Because the model parameters is updated, the subsequent loss.backward() call is not working correctly - this is the inplace update the error message is referring to. More information about the error here:
- https://discuss.pytorch.org/t/solved-pytorch1-5-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/90256/43
- https://discuss.pytorch.org/t/runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation-torch-floattensor-1-is-at-version-2-expected-version-1-instead/164474