trl icon indicating copy to clipboard operation
trl copied to clipboard

why kl = nan when grpo train?

Open uilstong opened this issue 4 months ago • 1 comments

the question is:https://github.com/huggingface/open-r1/issues/704 can somebody help me?

uilstong avatar Sep 09 '25 02:09 uilstong

The beta parameter of GRPOConfig (documentation here) must be explicitly modified to non-zero for the reference model to load and to get KL data.

xuanduy04 avatar Nov 26 '25 10:11 xuanduy04