Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch copied to clipboard
bug in reinforce with baseline
the update value network should be:
alpha_w = 1e-3 # 初始化
optimizer_w = optim.Adam(**s_value_func**.parameters(), lr=alpha_w)
optimizer_w.zero_grad()
policy_loss_w =-delta
policy_loss_w.backward(retain_graph = True)
clip_grad_norm_(policy_loss_w, 0.1)
optimizer_w.step()
There's some error in this code. when run this code,it shows some error about compute graph. do you meet the same problem?
same problem here, you can debug it step by step to see the errors