DeepSpeedExamples The problem of model parallelism in training

The problem of model parallelism in training

Open wanghao-007 opened this issue 10 months ago • 1 comments

In most cases, I think we need model parallelism more than data parallelism. It is hoped that the model can be trained in parallel, because the current model is very large, 80G graphics card is not enough to load a complete actor model, reference model, critic model and reward model. For example, I used llama 7B as the actor model and reference model, critic model and reward model, and the model could not be fully loaded on the three A100 80G, and OOM appeared.

Aug 29 '23 06:08 wanghao-007

DeepSpeedExamples DeepSpeedExamples copied to clipboard

The problem of model parallelism in training

DeepSpeedExamples
DeepSpeedExamples copied to clipboard