GRPO Training Speed Testing

Open hjh0119 opened this issue 9 months ago • 1 comments

I personally tested the speed of ms-swift grpo under different settings and compared it with the speed of verl under the same training hyperparameters.

Test Setup:

Model: Qwen2.5-7B-Instruct with GRPO full parameter training
Dataset: AI-MO/NuminaMath-TIR
Per device batch size: 16
Number of generations: 8
Max steps: 50
Number of devices: 8 (8 * NVIDIA A800-SXM4-80GB)

Variable Explanation:

1/2 lmdeploy: Means using 1 or 2 GPUs for deployment inference (num_infer_workers).
sync/async: Indicates whether asynchronous generation is used (async_generate).
mu: Number of iterations (num_iterations).

Experimental Results:

With different ms-swift settings, the time for 50 steps ranged from 7 minutes to 30 minutes, with each step varying from 8 seconds to 36 seconds. In contrast, the verl test results showed approximately 60-70 seconds per step.

Confusion: It seems that the verl speed is relatively slow. I'm unsure if I have incorrectly set some parameters. Open to discussion!

related report can be checked in https://api.wandb.ai/links/jinghan0119-zhejiang-university/1ofvr2w1

Feb 27 '25 03:02 hjh0119

Maybe because you used "lora" finetuning for MS-SWIFT but the VERL only supports "full" finetuning?

May 02 '25 04:05 tjoymeed