xtuner [Feature] Pipeline Parallelization of Different Stages in RLHF

[Feature] Pipeline Parallelization of Different Stages in RLHF

Open llkn-2 opened this issue 7 months ago • 4 comments

Motivation

The RLHF process can be divided into three stages: Generation, Forward, and Train. In the Generation stage, responses are generated using the vLLM. During the Forward stage, the actor, critic, reference, and reward models perform inference. In the Train stage, the actor and critic models undergo training.

During the execution of each stage, the GPUs for the other stages remain idle, leading to resource wastage.

To address this issue, we can optimize the process by leveraging the concept of pipeline parallelism. The batch data is divided into multiple smaller micro-batches. After processing a micro-batch in one stage, the data is immediately passed to the next stage for processing, rather than waiting for the entire batch to be completed. This approach reduces the idle time of GPUs in each stage, thereby improving resource utilization.

Modification

The code has been modified based on PR https://github.com/InternLM/xtuner/pull/736. The entrypoint file is xtuner/rlhf/pipeline.py.

Jul 31 '24 10:07 llkn-2

xtuner xtuner copied to clipboard

[Feature] Pipeline Parallelization of Different Stages in RLHF

Motivation

Modification

xtuner
xtuner copied to clipboard