trl
trl copied to clipboard
GRPO as part of HF TRL?
Feature request
Qwen2.5-Math and Qwen2.5-Code are two state-of-the-art models that have recently integrated GRPO (Group Relative Policy Optimization)
Motivation
https://qwenlm.github.io/blog/qwen2.5-math/ https://arxiv.org/pdf/2402.03300
Your contribution
This is a request-only post, so I don't contribute anything to it.
Hello @JumpingRain there is an open PR for this in #1954 that is currently under development
+1 It would be great to have this, looking forward to it!