xuanduy04

Results 1 comments of xuanduy04

The `beta` parameter of `GRPOConfig` ([documentation here](https://github.com/huggingface/trl/blob/cac9f1d8e24eb3aeb40c06a99d27a88f4b3d1c83/trl/trainer/grpo_config.py#L159C1-L159C9)) must be explicitly modified to non-zero for the reference model to load and to get KL data.