VAND-APRIL-GAN 关于训练时梯度的问题

您好，我在修改train.py文件进行网络训练的时候，在最后loss计算梯度的时候出现了如下错误：RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation，请问您知道该问题如何解决吗？我的cuda版本12.2，因此使用requirement.txt中的版本不合适，我先使用了torch2.1.0的版本，之后更换到 2.2.1+cu118版本均会出现该问题。希望您的回复。

Feb 27 '24 08:02 genzhengmiaohong

你解决了吗？我也遇到了这个问题

Feb 29 '24 11:02 tangyz213

Can you provide more detailed error information, please? I need to pinpoint the location of the error.

Mar 01 '24 11:03 ByChelsea

Can you provide more detailed error information, please? I need to pinpoint the location of the error.

Traceback (most recent call last): File "train.py", line 177, in train(args) File "train.py", line 140, in train loss.backward() File "C:\Users\yzc.conda\envs\APRIL_GAN\lib\site-packages\torch_tensor.py", line 522, in backward torch.autograd.backward( File "C:\Users\yzc.conda\envs\APRIL_GAN\lib\site-packages\torch\autograd_init_.py", line 266, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [8, 1369, 768]], which is output 0 of DivBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

my env: windows11, torch 2.2.2+cu121 In my env, I modified line 122 in train.py to the following and then the error disappeared

patch_tokens[layer] = patch_tokens[layer] / patch_tokens[layer].norm(dim=-1, keepdim=True)

Apr 07 '24 08:04 yangzc0214

fix it here

Apr 15 '24 06:04 oylz

VAND-APRIL-GAN VAND-APRIL-GAN copied to clipboard

关于训练时梯度的问题

VAND-APRIL-GAN
VAND-APRIL-GAN copied to clipboard