M-ADA icon indicating copy to clipboard operation
M-ADA copied to clipboard

RuntimeError in backward pass due to in-place operation

Open muhammadsmalik opened this issue 10 months ago • 1 comments

After modifying this line to include "retain_graph": D_loss.backward(retain_graph=True)

I now keep getting this error: python .\main_Digits.py Pre-train wae Loading MNIST dataset. Loading MNIST dataset. Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error: File ".\main_Digits.py", line 321, in main() File ".\main_Digits.py", line 83, in main train(model, exp_name, kwargs) File ".\main_Digits.py", line 97, in train wae_train(wae, discriminator, train_loader, wae_optimizer, d_optimizer, epoch) File ".\main_Digits.py", line 290, in wae_train D_z_tilde = D(z_tilde.clone()) File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\module.py", line 550, in call result = self.forward(*input, **kwargs) File "C:\Users\admin\Desktop\MPhil\Domain_Generalization\models\ada_conv.py", line 66, in forward return self.net(z) File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\module.py", line 550, in call result = self.forward(*input, **kwargs) File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\container.py", line 100, in forward input = module(input) File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\module.py", line 550, in call result = self.forward(*input, **kwargs) File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\nn\functional.py", line 1610, in linear ret = torch.addmm(bias, input, weight.t()) (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:60) Traceback (most recent call last): File ".\main_Digits.py", line 321, in main() File ".\main_Digits.py", line 83, in main train(model, exp_name, kwargs) File ".\main_Digits.py", line 97, in train wae_train(wae, discriminator, train_loader, wae_optimizer, d_optimizer, epoch) File ".\main_Digits.py", line 306, in wae_train loss.backward() # No need to retain the graph here if this is the final use of it File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\admin\anaconda3\envs\M-ADA\lib\site-packages\torch\autograd_init_.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

muhammadsmalik avatar Apr 16 '24 13:04 muhammadsmalik

You might have to shuffle the d_optimzer until after loss.backward() has been called in the wae training loop. This was running on an older version of pytorch where the error was not thrown, but the gradients calculated based on the sequence of operations in the existing code is not correct.

Basically, it seems the model parameters are updated, after d_optimiser.step() has been called. Because the model parameters is updated, the subsequent loss.backward() call is not working correctly - this is the inplace update the error message is referring to. More information about the error here:

  • https://discuss.pytorch.org/t/solved-pytorch1-5-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/90256/43
  • https://discuss.pytorch.org/t/runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation-torch-floattensor-1-is-at-version-2-expected-version-1-instead/164474

lisa-lthorrold avatar Aug 01 '24 08:08 lisa-lthorrold