verl Add asynchronous rollout + reward stage to PPOTrainer

When training on code tasks, the reward stage can take quite a long time, as it requires compiling the model's output and running quite a lot of test cases. In some setups we have experienced the reward stage to take almost as much time as rollout (140s vs 180s respectively). Yet, it is possible to hide some of the reward stage's latency by overlapping it with the rollout stage.

It is possible to overlap the rollout and reward stages because 1) vLLM can asynchronously return the first trajectory before it finishes the rest 2) rollot is GPU bound and verification is CPU bound. This feature could be implemented by leveraging vLLM's AsyncLLMEngine API. I've looked into how this could be made in veRL and it seems that this feature would require changes to the DataProto, BaseRollout and *RewardManager APIs.

Would it be possible to implement something like this in veRL? If you are interested in a feature like this, but don't have the bandwidth, I could help out myself.

Apr 08 '25 17:04 dvmazur

Has been implemented in https://github.com/agentica-project/verl-pipeline, which was used to make the the DeepCoder-14B model.

They saw 2.5x speedup in code RL training.

Apr 10 '25 02:04 sunjin-k

Great! Didn't know about this project! Are there any plans to add this functionality to veRL?

Apr 10 '25 14:04 dvmazur

I think they mention that this is only done for the 1.5B not the 14B yet. Would defo love to see this merged into verl

Apr 10 '25 21:04 faresobeid

I think they mention that this is only done for the 1.5B not the 14B yet

Yeah, it appears you are right. At least they provide the comparison for the 1.5B models. I'll keep this issue open then. Certainly seems like a good feature to have

Apr 10 '25 21:04 dvmazur

@faresobeid @sunjin-k @dvmazur It seems that their implementation is based on an earlier version of vllm (before v0.8.2). At vllm==0.8.2 and 0.8.3, the model executor is background processes launched by AsyncLLM and we cannot access the model weights from AsyncLLMEngine anymore.

Apr 11 '25 04:04 SparkJiao

@youkaichao could u comment on why latest version of vllm limit weight handles?

Apr 12 '25 05:04 eric-haibin-lin

And yes I agree using the async LLM engine sounds like a promising approach overall. We can use a compatible version of vllm to develop the feature while waiting compatibility patches from vllm main

Apr 12 '25 05:04 eric-haibin-lin

why latest version of vllm limit weight handles

what do weight handles mean?

Apr 13 '25 08:04 youkaichao

why latest version of vllm limit weight handles

what do weight handles mean?

I think it refers to access to model executor (the llm). In the implementation above, we can update the weights simply: https://github.com/agentica-project/verl-pipeline/blob/master/verl/workers/sharding_manager/fsdp_vllm.py#L99-L102.

I think since vllm 0.8.2, the model executor becomes background processes so that we cannot access the weights.

Apr 14 '25 02:04 SparkJiao

I have tried the asynchronous rollout approach in the verl-pipeline, but currently I’m facing an issue: when rollout_wg and actor_wg are separated, updating the vLLM parameters relies on Ray's communication plane instead of high-performance communication operators like NCCL. This increases the communication load on the driver process (since all data is distributed through the driver), which results in worse performance compared to the HybridEngine. In actual tests, communication takes 80 seconds, accounting for 20% of a 400-second step. This issue might require Ray to support GPU-level communication features in order to be resolved.

Apr 15 '25 07:04 Wraythh

Hi! Any updates here? Maybe we should use SGLang instead of vllm for the requested pipeline from DeepCoder?

May 12 '25 21:05 Inf1delis

Hi! I noticed, veRL now support async rollouts for SGLang. I'm sure it would be possible to implement agentica's pipeline using it. Would you be interested in a feature like this? I could come up with a design / MVP if you are interested

May 27 '25 11:05 dvmazur

Hi! I noticed, veRL now support async rollouts for SGLang. I'm sure it would be possible to implement agentica's pipeline using it. Would you be interested in a feature like this? I could come up with a design / MVP if you are interested

That would be great! Although I am not sure, how would you want to leverage sglang's async rollout?

Jun 04 '25 22:06 eric-haibin-lin