耿飚 comments

Repositories
Issues
Comments

Results 2 comments of


                                            耿飚

/chapter5/chapter5_questions&keywords

> @JimmyYoungggg > 请问博主PPO算法里的θk多久更新一次？如果是每次迭代都更新的话，那采样效率岂不是依然不高？ # update policy every n steps if self.sample_count % self.update_freq != 0: return 看代码，频率可以自己设置的。

/chapter5/chapter5_questions&keywords

> @Strawberry47 > 啊，代码部分我还有一个critic_loss计算问题：是Q_value(old)-critic_value(new)，这样算的吗？不知道理解的对不对~ critic_loss = (returns - values).pow(2).mean() 这里做了一个MSE,critic这个网络是用来估计V的。