MOSS-RLHF icon indicating copy to clipboard operation
MOSS-RLHF copied to clipboard

Can I run this pipeline on A100-40GB?

Open zwhe99 opened this issue 1 year ago • 3 comments

Salute to your valuable contribution first!

I notice that you run the experiments with eight A100-80G GPUs. But can it be run on eight A100-40GB GPUs with some technique like ZERO3?

zwhe99 avatar Jul 13 '23 12:07 zwhe99

In our experiments, the GPU memory cost about 50G in ZERO2 and without offload. if you have a larger cpu memory, may be you can use A100-40GB to train your model.

Ablustrund avatar Jul 13 '23 15:07 Ablustrund

with 2*8 of batch size.

Ablustrund avatar Jul 13 '23 15:07 Ablustrund

Thanks and what is your RAW usage rate?

zwhe99 avatar Jul 13 '23 15:07 zwhe99

Thanks and what is your RAW usage rate?

The memory cost about 800~900GB with 2*8 of batch size and 4k multi-turn query in training phase.

Ablustrund avatar Jul 14 '23 06:07 Ablustrund