RuntimeError about AC_CartPole.py
I didn't change anything about 8_Actor_Critic_Advantage/AC_CartPole.py. I just ran it, but I got this
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 1]], which is o
utput 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gr
adient, with torch.autograd.set_detect_anomaly(True).
So, I add torch.autograd.set_detect_anomaly(True) to code, but I got this
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 1]], which is o
utput 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its g
radient. The variable in question was changed in there or anywhere later. Good luck!
my pytorch version is 1.7.0. my numpy version is 1.18.5
请问pytorch环境是什么谢谢
我在actor和critic的learn()上瞎改一通后能跑了
def learn(self, s, a, td): s = torch.Tensor(s[np.newaxis, :]) acts_prob = self.actor_net(s) log_prob = torch.log(acts_prob[0, a]) with torch.no_grad(): exp_v = torch.mean(log_prob * td loss = -exp_v torch.autograd.set_detect_anomaly(True) loss.requires_grad_(True) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return exp_v
def learn(self, s, r, s_): s, s_ = torch.Tensor(s[np.newaxis, :]), torch.Tensor(s_[np.newaxis, :]) v, v_ = self.critic_net(s), self.critic_net(s_) with torch.no_grad(): td_error = r + GAMMA * v_ - v loss = td_error ** 2 loss.requires_grad_(True) torch.autograd.set_detect_anomaly(True) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return td_error
源码报错原因好像是因为critic的梯度传到actor中了或者是梯度没计算啥的...我也是小白,也没搞懂
您的来信已收到,祝您生活愉快!
我也出现了相同的问题,RuntimeError about AC_CartPole.py 通过NaturalShower老哥给出的方法虽然不报错了,代码可以正常运行了,但是代码无法收敛。
https://zhuanlan.zhihu.com/p/511825440
您的来信已收到,祝您生活愉快!
将105行改为:return td_error.detach() 别把梯度传过去
您的来信已收到,祝您生活愉快!