PaddleNLP Integrate DataProto into the GRPO

在 GRPO 算法中初步引入了 DataProto 类型并测试了 train, eval 流程

May 15 '25 07:05 WanpengXu

Thanks for your contribution!

May 15 '25 07:05 paddle-bot[bot]

All committers have signed the CLA.

May 15 '25 07:05 CLAassistant

Codecov Report

:x: Patch coverage is 0% with 407 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 46.83%. Comparing base (7309a5d) to head (f055a44). :warning: Report is 87 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/datasets/rlhf_datasets/protocol.py	0.00%	190 Missing :warning:
paddlenlp/rl/trainer/ppo_trainer.py	0.00%	99 Missing :warning:
paddlenlp/rl/utils/comm_utils.py	0.00%	71 Missing :warning:
paddlenlp/rl/trainer/actor_trainer.py	0.00%	29 Missing :warning:
paddlenlp/rl/trainer/critic_trainer.py	0.00%	12 Missing :warning:
paddlenlp/rl/trainer/reward_trainer.py	0.00%	6 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #10597      +/-   ##
===========================================
- Coverage    46.93%   46.83%   -0.11%     
===========================================
  Files          800      800              
  Lines       132741   132936     +195     
===========================================
- Hits         62304    62254      -50     
- Misses       70437    70682     +245

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Jun 10 '25 04:06 codecov[bot]

protocol.py 文件最上方增加版权信息

# Copyright 2024 Bytedance Ltd. and/or its affiliates

Jun 12 '25 09:06 DrownFish19

自动 gather 和 concat 的逻辑在哪里呢？

Jun 18 '25 07:06 gongel

自动 gather 和 concat 的逻辑在哪里呢？

这个目前是通过手动调用装饰器函数 gather_tensor_list 包装 pad_or_concat_tensor_list 实现 concat 前自动 gather 的。因为 gather_tensor_list 需要 data_parallel_group 和 sharding_parallel_group 参数，在 protocol.py 里获取分布式参数有点奇怪，所以就没有以 @gather_tensor_list(data_parallel_group=dp_group, sharding_parallel_group=sp_group) 这种形式写在 protocol.py 里 https://github.com/PaddlePaddle/PaddleNLP/blob/f055a448189c1de5855325de902c6f32cf325034/paddlenlp/rl/trainer/ppo_trainer.py#L1517-L1519

Jun 18 '25 07:06 WanpengXu

PaddleNLP PaddleNLP copied to clipboard

Integrate DataProto into the GRPO

Codecov Report

PaddleNLP
PaddleNLP copied to clipboard