Liger-Kernel
Liger-Kernel copied to clipboard
Support DAPO Chunked loss
trafficstars
🚀 The feature, motivation and pitch
ByteDance DAPO is the open-sourced SOTA RL algorithm that achieves 50 points on AIME 2024 based on the Qwen2.5-32B pre-trained model, surpassing the previous SOTA achieved by DeepSeek's GRPO. Original DAPO code is publicly available now.
Alternatives
No response
Additional context
No response
Hi! I'd like to take on this issue.
Hi! I'd like to take on this issue.
🚀 Thanks!Assigned.