Reinforcement-learning-with-PyTorch icon indicating copy to clipboard operation
Reinforcement-learning-with-PyTorch copied to clipboard

RuntimeError about AC_CartPole.py

Open Coder-Liuu opened this issue 4 years ago • 8 comments

I didn't change anything about 8_Actor_Critic_Advantage/AC_CartPole.py. I just ran it, but I got this

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 1]], which is o
utput 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gr
adient, with torch.autograd.set_detect_anomaly(True).

So, I add torch.autograd.set_detect_anomaly(True) to code, but I got this

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 1]], which is o
utput 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its g
radient. The variable in question was changed in there or anywhere later. Good luck!

my pytorch version is 1.7.0. my numpy version is 1.18.5

Coder-Liuu avatar Nov 23 '21 06:11 Coder-Liuu

请问pytorch环境是什么谢谢

Gera001 avatar Mar 11 '22 10:03 Gera001

我在actor和critic的learn()上瞎改一通后能跑了

def learn(self, s, a, td): s = torch.Tensor(s[np.newaxis, :]) acts_prob = self.actor_net(s) log_prob = torch.log(acts_prob[0, a]) with torch.no_grad(): exp_v = torch.mean(log_prob * td loss = -exp_v torch.autograd.set_detect_anomaly(True) loss.requires_grad_(True) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return exp_v

def learn(self, s, r, s_): s, s_ = torch.Tensor(s[np.newaxis, :]), torch.Tensor(s_[np.newaxis, :]) v, v_ = self.critic_net(s), self.critic_net(s_) with torch.no_grad(): td_error = r + GAMMA * v_ - v loss = td_error ** 2 loss.requires_grad_(True) torch.autograd.set_detect_anomaly(True) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return td_error

源码报错原因好像是因为critic的梯度传到actor中了或者是梯度没计算啥的...我也是小白,也没搞懂

NaturalShower avatar May 06 '22 12:05 NaturalShower

您的来信已收到,祝您生活愉快!

ClownW avatar May 06 '22 12:05 ClownW

我也出现了相同的问题,RuntimeError about AC_CartPole.py 通过NaturalShower老哥给出的方法虽然不报错了,代码可以正常运行了,但是代码无法收敛。

henbudidiao avatar May 08 '22 13:05 henbudidiao

https://zhuanlan.zhihu.com/p/511825440

henbudidiao avatar Jul 04 '22 08:07 henbudidiao

您的来信已收到,祝您生活愉快!

ClownW avatar Jul 04 '22 08:07 ClownW

将105行改为:return td_error.detach() 别把梯度传过去

i-Qin avatar Aug 07 '23 09:08 i-Qin

您的来信已收到,祝您生活愉快!

ClownW avatar Aug 07 '23 09:08 ClownW