Jsjgjhg
Jsjgjhg
Did you solve this problem?
> Is your question about the metrics to use? Yes,how to measure whether a reward model is beneficial for PPO training?
Why is indicator ppo/policy/loss always negative?like this: ![Uploading 1726630796123.png…]()
I hope to receive your guidance, thank you
Thank you.