PaddleNLP
PaddleNLP copied to clipboard
Integrate DataProto into the GRPO
在 GRPO 算法中初步引入了 DataProto 类型并测试了 train, eval 流程
Thanks for your contribution!
Codecov Report
:x: Patch coverage is 0% with 407 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 46.83%. Comparing base (7309a5d) to head (f055a44).
:warning: Report is 87 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #10597 +/- ##
===========================================
- Coverage 46.93% 46.83% -0.11%
===========================================
Files 800 800
Lines 132741 132936 +195
===========================================
- Hits 62304 62254 -50
- Misses 70437 70682 +245
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
protocol.py 文件最上方增加版权信息
# Copyright 2024 Bytedance Ltd. and/or its affiliates
自动 gather 和 concat 的逻辑在哪里呢?
自动 gather 和 concat 的逻辑在哪里呢?
这个目前是通过手动调用装饰器函数 gather_tensor_list 包装 pad_or_concat_tensor_list 实现 concat 前自动 gather 的。
因为 gather_tensor_list 需要 data_parallel_group 和 sharding_parallel_group 参数,在 protocol.py 里获取分布式参数有点奇怪,所以就没有以 @gather_tensor_list(data_parallel_group=dp_group, sharding_parallel_group=sp_group) 这种形式写在 protocol.py 里
https://github.com/PaddlePaddle/PaddleNLP/blob/f055a448189c1de5855325de902c6f32cf325034/paddlenlp/rl/trainer/ppo_trainer.py#L1517-L1519