结合 GRPO 支持 DeepSeek-R1 等推理模型的复现,达到 huggingface open-r1 的类似效果
Reminder
- [x] I have read the above rules and searched the existing issues.
Description
支持在已有模型的基础上复现出 DeepSeek-R1 的效果,主要需要整合 GRPO 算法,GRPO 目前已在 git+https://github.com/huggingface/trl.git 中得到实现
Pull Request
No response
+1
+1
+1
+1
+1
我们大概什么时候整合 GRPO 算法呢
哈哈哈都在等个大佬,你们不能拉个分支改一下吗
哈哈哈都在等个大佬,你们不能拉个分支改一下吗
官方的支持才是最好的最完善的
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+1
+10
摩搭ms-swift已经上了grpo了,可以合一代码过来
+1
+1
+1
+1
+10086