trl icon indicating copy to clipboard operation
trl copied to clipboard

GRPO as part of HF TRL?

Open JumpingRain opened this issue 1 year ago • 1 comments

Feature request

Qwen2.5-Math and Qwen2.5-Code are two state-of-the-art models that have recently integrated GRPO (Group Relative Policy Optimization)

Motivation

https://qwenlm.github.io/blog/qwen2.5-math/ https://arxiv.org/pdf/2402.03300

Your contribution

This is a request-only post, so I don't contribute anything to it.

JumpingRain avatar Sep 23 '24 14:09 JumpingRain

Hello @JumpingRain there is an open PR for this in #1954 that is currently under development

lewtun avatar Sep 24 '24 08:09 lewtun

+1 It would be great to have this, looking forward to it!

fzyzcjy avatar Nov 30 '24 06:11 fzyzcjy