zero-bubble-pipeline-parallelism icon indicating copy to clipboard operation
zero-bubble-pipeline-parallelism copied to clipboard

interleaved 1F1B seems to work better

Open zhj96 opened this issue 9 months ago • 3 comments

I tried multiple sets of experiments, but found that ZB is better than 1F1B. Interleaved 1F1B seems to be slightly faster than ZB_V, slightly slower than ZB_2P but saves a lot of GPU memory.

machine: 8*H800 80G model:6.2B

1F1B ​55 samples/(8 GPU)/seconds 48G MEM ​INTERLEAVED_1F1B 66 samples/(8 GPU)/seconds 57G MEM ZB_2P 67 samples/(8 GPU)/seconds 79G MEM ZB_V 64 samples/(8 GPU)/seconds 53G MEM

zhj96 avatar May 14 '24 12:05 zhj96