Reinforcement-learning-with-PyTorch RuntimeError about AC

I didn't change anything about 8_Actor_Critic_Advantage/AC_CartPole.py. I just ran it, but I got this

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 1]], which is o
utput 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gr
adient, with torch.autograd.set_detect_anomaly(True).

So, I add torch.autograd.set_detect_anomaly(True) to code, but I got this

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 1]], which is o
utput 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its g
radient. The variable in question was changed in there or anywhere later. Good luck!

my pytorch version is 1.7.0. my numpy version is 1.18.5

Nov 23 '21 06:11 Coder-Liuu

请问pytorch环境是什么谢谢

Mar 11 '22 10:03 Gera001

我在actor和critic的learn()上瞎改一通后能跑了

def learn(self, s, a, td): s = torch.Tensor(s[np.newaxis, :]) acts_prob = self.actor_net(s) log_prob = torch.log(acts_prob[0, a]) with torch.no_grad(): exp_v = torch.mean(log_prob * td loss = -exp_v torch.autograd.set_detect_anomaly(True) loss.requires_grad_(True) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return exp_v

def learn(self, s, r, s_): s, s_ = torch.Tensor(s[np.newaxis, :]), torch.Tensor(s_[np.newaxis, :]) v, v_ = self.critic_net(s), self.critic_net(s_) with torch.no_grad(): td_error = r + GAMMA * v_ - v loss = td_error ** 2 loss.requires_grad_(True) torch.autograd.set_detect_anomaly(True) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return td_error

源码报错原因好像是因为critic的梯度传到actor中了或者是梯度没计算啥的...我也是小白，也没搞懂