zero-bubble-pipeline-parallelism icon indicating copy to clipboard operation
zero-bubble-pipeline-parallelism copied to clipboard

Zero Bubble Pipeline Parallelism

Results 24 zero-bubble-pipeline-parallelism issues
Sort by recently updated
recently updated
newest added

**Your question** It seems that B's timing includes W, while W merely accounts for the time of gradient accumulation. In the megatron/core/pipeline_parallel/zb_schedules.py file, the function `schedule_b` counts the duration of...

I tried multiple sets of experiments, but found that ZB is better than 1F1B. Interleaved 1F1B seems to be slightly faster than ZB_V, slightly slower than ZB_2P but saves a...

Currently the limitation is that `(number_of_layers / number_of_stage)` needs to be a even number.

Hi, I currently want to adapt zbv for Paddle. In your work, the main role of rollback is to reduce synchronization. However, the grad_norm in the opt stage requires all_reduce_sum,...

Hi, very appreciate your work. I have a question for zbh1 mode. This is one part of your code: ``` # For BWF pattern or in rank 0, we don't...

@ufotalent To implement a version using our own running engine and async IO @QPHutu To implement a version by modifying 1f1b schedule using sync IO