OpenRLHF About using vLLM for generation

About using vLLM for generation

Open LSC527 opened this issue 6 months ago • 5 comments

I have some thoughts about using vLLM for generation. Feel free to correct me if I were wrong.

Batching It seems that prompts are still passing to vllm engines in micro rollout batches during make_experience. However, passing all prompts to vllm engines all at once is very likely to imporve generation throughput.
Placement The device placement of vLLM engines seems quite random. For example, this is what happens when running examples/scripts/train_ppo_llama_ray_70b.sh: run 1, matser node:

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    780546      C   ray::CriticModelRayActor         2806MiB |
|    1   N/A  N/A    780769      C   ray::CriticModelRayActor         2970MiB |
|    2   N/A  N/A    780770      C   ray::CriticModelRayActor         2990MiB |
|    3   N/A  N/A    780771      C   ray::CriticModelRayActor         2798MiB |
|    4   N/A  N/A    781017      C   ray::RewardModelRayActor         2530MiB |
|    5   N/A  N/A    781612      C   ray::RewardModelRayActor         2526MiB |
|    6   N/A  N/A    787426      C   ray::RayWorkerVllm              74344MiB |
|    7   N/A  N/A    787427      C   ray::RayWorkerVllm              74264MiB |
+-----------------------------------------------------------------------------+

run 2, matser node:

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    531162      C   ray::ActorModelRayActor.fit      2822MiB |
|    1   N/A  N/A    531384      C   ray::ActorModelRayActor.fit      3014MiB |
|    2   N/A  N/A    531385      C   ray::ActorModelRayActor.fit      3014MiB |
|    3   N/A  N/A    531386      C   ray::ActorModelRayActor.fit      2822MiB |
|    4   N/A  N/A    531387      C   ray::CriticModelRayActor         2824MiB |
|    5   N/A  N/A    532043      C   ray::CriticModelRayActor         3016MiB |
|    6   N/A  N/A    532044      C   ray::CriticModelRayActor         3016MiB |
|    7   N/A  N/A    532045      C   ray::CriticModelRayActor         2824MiB |
+-----------------------------------------------------------------------------+

To reduce communication overhead of broadcasting parameters from actor models to vllm engines, vllm engines and actor models are expected to be placed as close as possible. (e.g. on the same node) The ideal device placement of this training task would be:

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    531162      C   ray::ActorModelRayActor.fit      2822MiB |
|    1   N/A  N/A    531384      C   ray::ActorModelRayActor.fit      3014MiB |
|    2   N/A  N/A    531385      C   ray::ActorModelRayActor.fit      3014MiB |
|    3   N/A  N/A    531386      C   ray::ActorModelRayActor.fit      2822MiB |
|    4   N/A  N/A    787426      C   ray::RayWorkerVllm              74344MiB |
|    5   N/A  N/A    787427      C   ray::RayWorkerVllm              74264MiB |
|    6   N/A  N/A    787426      C   ray::RayWorkerVllm              74344MiB |
|    7   N/A  N/A    787427      C   ray::RayWorkerVllm              74264MiB |
+-----------------------------------------------------------------------------+

Jan 10 '24 04:01 LSC527

OpenRLHF OpenRLHF copied to clipboard

About using vLLM for generation

OpenRLHF
OpenRLHF copied to clipboard