yesiam-png
yesiam-png
Hi, Thanks for your excellent work and contribution! I have two questions: 1. Since I'm using google colab, it seems I can't open the `meshcat-server` before running the training code....
Dear authors, 1. I noticed that the reference policy is fixed as the initial policy, instead of updating as the last iter's policy. May I know the reason for it...
Dear authors, may I know how we can train the iterative DPO baseline model using this repo? Is there a convenient way to modify the sppo code?
Hi, thanks for the awesome framework! Do you have any update or timeline on the code of training with PPO and PRMs? I just want to decide if I have...