Zhou Zhibo comments

Repositories
Issues
Comments

Results 2 comments of


                                            Zhou Zhibo

会不会支持异步生成训练

我们在使用中发现异步生成训练会带来不小的性能提升，请问目前这个功能是否有开发的计划以及相应的方案

Add asynchronous rollout + reward stage to PPOTrainer

I have tried the asynchronous rollout approach in the verl-pipeline, but currently I’m facing an issue: when rollout_wg and actor_wg are separated, updating the vLLM parameters relies on Ray's communication...