zero-bubble-pipeline-parallelism
zero-bubble-pipeline-parallelism copied to clipboard
interleaved 1F1B seems to work better
I tried multiple sets of experiments, but found that ZB is better than 1F1B. Interleaved 1F1B seems to be slightly faster than ZB_V, slightly slower than ZB_2P but saves a lot of GPU memory.
machine: 8*H800 80G model:6.2B
1F1B 55 samples/(8 GPU)/seconds 48G MEM INTERLEAVED_1F1B 66 samples/(8 GPU)/seconds 57G MEM ZB_2P 67 samples/(8 GPU)/seconds 79G MEM ZB_V 64 samples/(8 GPU)/seconds 53G MEM