Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Support DAPO Chunked loss

Open qingquansong opened this issue 8 months ago • 2 comments
trafficstars

🚀 The feature, motivation and pitch

ByteDance DAPO is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO. Original DAPO code is publicly available now.

Alternatives

No response

Additional context

No response

qingquansong avatar Mar 21 '25 08:03 qingquansong

Hi! I'd like to take on this issue.

srzhu97 avatar Mar 21 '25 08:03 srzhu97

Hi! I'd like to take on this issue.

🚀 Thanks!Assigned.

qingquansong avatar Mar 21 '25 08:03 qingquansong