verl
verl copied to clipboard
[model] feat: add qwen25 gspo 3b script on ASCEND NPU
What does this PR do?
add examples/gspo_trainer/test_gspo_3b_math_npu.sh