zero-bubble-pipeline-parallelism issues

Results 24 zero-bubble-pipeline-parallelism issues

Sort by recently updated

[ENHANCEMENT] Support V-Min and V-Half schedules

https://arxiv.org/abs/2405.15362

[QUESTION] Measuring Pipeline Bubble Time During Megatron-LM Training

I'm curious about how you measured the precise bubble time during a run in your experiments(T_Comm in the paper). Megatron-LM's scheduling combines communication and idle time within the same NCCL...

HodBadichi

[QUESTION]1f1b is fast then zero-v

i test llama2 13b on a800, the pp parallelism is 4 and micro-batch-size = 1 and global-batch-size = 64 the 1f1b log, i just use 1f1b, not use vp iteration...

kuangdao

[QUESTION] IS zero bubble pp support flash-atten?

I SEE zero-bubble-pipeline-parallelism disabled FusdLayerNorm，Is it because of the fused op can not split backward of w and x？

qq1243196045

zero-bubble-pipeline-parallelism
zero-bubble-pipeline-parallelism copied to clipboard

Metadata

[ENHANCEMENT] Support V-Min and V-Half schedules

[QUESTION] Measuring Pipeline Bubble Time During Megatron-LM Training

[QUESTION]1f1b is fast then zero-v

[QUESTION] IS zero bubble pp support flash-atten?

← Metadata

Owner

Metadata

zero-bubble-pipeline-parallelism zero-bubble-pipeline-parallelism copied to clipboard

Metadata

[ENHANCEMENT] Support V-Min and V-Half schedules

[QUESTION] Measuring Pipeline Bubble Time During Megatron-LM Training

[QUESTION]1f1b is fast then zero-v

[QUESTION] IS zero bubble pp support flash-atten?

← Metadata

Owner

Metadata

zero-bubble-pipeline-parallelism
zero-bubble-pipeline-parallelism copied to clipboard