PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

Integrate DataProto into the GRPO

Open WanpengXu opened this issue 6 months ago • 4 comments

在 GRPO 算法中初步引入了 DataProto 类型并测试了 train, eval 流程

WanpengXu avatar May 15 '25 07:05 WanpengXu

Thanks for your contribution!

paddle-bot[bot] avatar May 15 '25 07:05 paddle-bot[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 15 '25 07:05 CLAassistant

Codecov Report

:x: Patch coverage is 0% with 407 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 46.83%. Comparing base (7309a5d) to head (f055a44). :warning: Report is 87 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/datasets/rlhf_datasets/protocol.py 0.00% 190 Missing :warning:
paddlenlp/rl/trainer/ppo_trainer.py 0.00% 99 Missing :warning:
paddlenlp/rl/utils/comm_utils.py 0.00% 71 Missing :warning:
paddlenlp/rl/trainer/actor_trainer.py 0.00% 29 Missing :warning:
paddlenlp/rl/trainer/critic_trainer.py 0.00% 12 Missing :warning:
paddlenlp/rl/trainer/reward_trainer.py 0.00% 6 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10597      +/-   ##
===========================================
- Coverage    46.93%   46.83%   -0.11%     
===========================================
  Files          800      800              
  Lines       132741   132936     +195     
===========================================
- Hits         62304    62254      -50     
- Misses       70437    70682     +245     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Jun 10 '25 04:06 codecov[bot]

protocol.py 文件最上方增加版权信息

# Copyright 2024 Bytedance Ltd. and/or its affiliates

DrownFish19 avatar Jun 12 '25 09:06 DrownFish19

自动 gather 和 concat 的逻辑在哪里呢?

gongel avatar Jun 18 '25 07:06 gongel

自动 gather 和 concat 的逻辑在哪里呢?

这个目前是通过手动调用装饰器函数 gather_tensor_list 包装 pad_or_concat_tensor_list 实现 concat 前自动 gather 的。 因为 gather_tensor_list 需要 data_parallel_group 和 sharding_parallel_group 参数,在 protocol.py 里获取分布式参数有点奇怪,所以就没有以 @gather_tensor_list(data_parallel_group=dp_group, sharding_parallel_group=sp_group) 这种形式写在 protocol.py 里 https://github.com/PaddlePaddle/PaddleNLP/blob/f055a448189c1de5855325de902c6f32cf325034/paddlenlp/rl/trainer/ppo_trainer.py#L1517-L1519

WanpengXu avatar Jun 18 '25 07:06 WanpengXu