LLaMA-Factory

LLaMA-Factory copied to clipboard

Published 5 months ago •

Reame
Issues

结合 GRPO 支持 DeepSeek-R1 等推理模型的复现，达到 huggingface open-r1 的类似效果

Open submartingales opened this issue 1 year ago • 53 comments

Reminder

[x] I have read the above rules and searched the existing issues.

Description

支持在已有模型的基础上复现出 DeepSeek-R1 的效果，主要需要整合 GRPO 算法，GRPO 目前已在 git+https://github.com/huggingface/trl.git 中得到实现

Pull Request

No response

Feb 02 '25 07:02 submartingales

+1

Feb 02 '25 19:02 Syazvinski

+1

Feb 05 '25 01:02 tghfly

+1

Feb 05 '25 08:02 Kk1984up

+1

Feb 05 '25 08:02 IvanWang0730

+1

Feb 05 '25 08:02 submartingales

我们大概什么时候整合 GRPO 算法呢

Feb 05 '25 09:02 Vic-CN

哈哈哈都在等个大佬，你们不能拉个分支改一下吗

Feb 06 '25 04:02 Harryjun

哈哈哈都在等个大佬，你们不能拉个分支改一下吗

官方的支持才是最好的最完善的

Feb 06 '25 15:02 submartingales

+1

Feb 07 '25 12:02 qingyuanxingsi

+1

Feb 08 '25 02:02 lsrami

+1

Feb 08 '25 06:02 plmsmile

+1

Feb 08 '25 09:02 piamo

+1

Feb 08 '25 12:02 GuodongFan

+1

Feb 10 '25 04:02 MSS444

+1

Feb 10 '25 06:02 mieco

+1

Feb 11 '25 06:02 CatIIIIIIII

+1

Feb 12 '25 08:02 yingzhao27

+1

Feb 12 '25 08:02 hongliang-wei

+1

Feb 12 '25 08:02 xiaoSUM

+1

Feb 12 '25 08:02 RRRRRayyyyy

+1

Feb 12 '25 09:02 AuroraJump

+1

Feb 12 '25 09:02 jcjajx123

+1

Feb 13 '25 07:02 Brucewuzhang

+10

Feb 13 '25 08:02 hpx502766238

摩搭ms-swift已经上了grpo了，可以合一代码过来

Feb 14 '25 00:02 tghfly

+1

Feb 14 '25 02:02 qingy1337

+1

Feb 14 '25 11:02 taddeusb90

+1

Feb 14 '25 13:02 iqraameer2489

+1

Feb 14 '25 15:02 zhing2006

+10086

Feb 15 '25 07:02 Christoph-XJ