Yutong comments

Results 4 comments of


                                            Yutong

[BUG]: train_rm.py get lower acc!

Yes, it is trained for one full epoch

There are several issues waiting to be treated

Hello同学，感谢提问^^已经在论道回复了一些关于repo的问题，关于平台的问题已经反馈给团队，我们会尽快更新。多蟹多蟹！ Yutongamber

A bug will lead to a failed greedy policy 训练代码有一个bug会使得greedy策略失效

Hello there, 感谢提问。你提出的方案确实是正确，可以直接提一个pull request^^ Thank you for asking. Your proposed correction has been checked and you could directly submit a request. ^^ Thanks again. Yutong

请教一个关于官方PPO的问题

哈喽，在设计PPO训练的时候，发现如果保留全部轨迹进行学习，算法收敛速度很慢；丢弃了没有到达终点的数据，是为了加快收敛。主要原因是环境只在最后一步返回奖励，奖励太稀疏，在算法训练初期智能体随机策略很难到达终点，所有奖励一直为0，更新也没有效果。谢谢提问，欢迎讨论^^~