MOSS-RLHF
MOSS-RLHF copied to clipboard
Can I run this pipeline on A100-40GB?
Salute to your valuable contribution first!
I notice that you run the experiments with eight A100-80G GPUs. But can it be run on eight A100-40GB GPUs with some technique like ZERO3?
In our experiments, the GPU memory cost about 50G in ZERO2 and without offload. if you have a larger cpu memory, may be you can use A100-40GB to train your model.
with 2*8 of batch size.
Thanks and what is your RAW usage rate?
Thanks and what is your RAW usage rate?
The memory cost about 800~900GB with 2*8 of batch size and 4k multi-turn query in training phase.