Florian Bauer
Results
1
issues of
Florian Bauer
### 🚀 Feature Group Relative Policy Optimization (GRPO) is a reinforcement learning algorithm introduced in https://arxiv.org/pdf/2402.03300, which has gained a lot of attention following its use in fine-tuning DeepSeek-R1. GRPO...
enhancement