zero-bubble-pipeline-parallelism issues

[QUESTION] The timing for B and W appears to be incorrect

2

**Your question** It seems that B's timing includes W, while W merely accounts for the time of gradient accumulation. In the megatron/core/pipeline_parallel/zb_schedules.py file, the function `schedule_b` counts the duration of...

RookieHong

interleaved 1F1B seems to work better

3

I tried multiple sets of experiments, but found that ZB is better than 1F1B. Interleaved 1F1B seems to be slightly faster than ZB_V, slightly slower than ZB_2P but saves a...

zhj96

More general ZBV scheduling

1

Currently the limitation is that `(number_of_layers / number_of_stage)` needs to be a even number.

mavenlin

[QUESTION] How to avoid synchronization when using sharding

3

Hi, I currently want to adapt zbv for Paddle. In your work, the main role of rollback is to reduce synchronization. However, the grad_norm in the opt stage requires all_reduce_sum,...

AndSonder

[WIP] Add a script to extract profiling data from nsys

huanggx-sea

[QUESTION] May I ask what tool was used to plot Figure 6 in paper.How can I profile bubble time in pipeline parallelism?

3

**Your question** How can I profile bubble time in pipeline parallelism?

starstream

[QUESTION] Whether to split bw when send_backward_recv_forward is not enabled

4

Hi, very appreciate your work. I have a question for zbh1 mode. This is one part of your code: ``` # For BWF pattern or in rank 0, we don't...

AndSonder

Support sequence parallel on main branch

1

ufotalent

Create a miniversion containing only ZB-H1 and essential changes so other megatron forks can easily integrate

5

@ufotalent To implement a version using our own running engine and async IO @QPHutu To implement a version by modifying 1f1b schedule using sync IO

ufotalent

Support sequence-parallel for zero bubble schedule

QPHutu

zero-bubble-pipeline-parallelism
zero-bubble-pipeline-parallelism copied to clipboard

Metadata

[QUESTION] The timing for B and W appears to be incorrect

interleaved 1F1B seems to work better

More general ZBV scheduling

[QUESTION] How to avoid synchronization when using sharding

[WIP] Add a script to extract profiling data from nsys

[QUESTION] May I ask what tool was used to plot Figure 6 in paper.How can I profile bubble time in pipeline parallelism?

[QUESTION] Whether to split bw when send_backward_recv_forward is not enabled

Support sequence parallel on main branch

Create a miniversion containing only ZB-H1 and essential changes so other megatron forks can easily integrate

Support sequence-parallel for zero bubble schedule

← Metadata

Owner

Metadata

zero-bubble-pipeline-parallelism zero-bubble-pipeline-parallelism copied to clipboard

Metadata

← Metadata

Owner

Metadata

zero-bubble-pipeline-parallelism
zero-bubble-pipeline-parallelism copied to clipboard