GRPO Training Speed Testing
I personally tested the speed of ms-swift grpo under different settings and compared it with the speed of verl under the same training hyperparameters.
Test Setup:
Model: Qwen2.5-7B-Instruct with GRPO full parameter training
Dataset: AI-MO/NuminaMath-TIR
Per device batch size: 16
Number of generations: 8
Max steps: 50
Number of devices: 8 (8 * NVIDIA A800-SXM4-80GB)
Variable Explanation:
1/2 lmdeploy: Means using 1 or 2 GPUs for deployment inference (num_infer_workers).
sync/async: Indicates whether asynchronous generation is used (async_generate).
mu: Number of iterations (num_iterations).
Experimental Results:
With different ms-swift settings, the time for 50 steps ranged from 7 minutes to 30 minutes, with each step varying from 8 seconds to 36 seconds. In contrast, the verl test results showed approximately 60-70 seconds per step.
Confusion: It seems that the verl speed is relatively slow. I'm unsure if I have incorrectly set some parameters. Open to discussion!
related report can be checked in https://api.wandb.ai/links/jinghan0119-zhejiang-university/1ofvr2w1
Maybe because you used "lora" finetuning for MS-SWIFT but the VERL only supports "full" finetuning?