DeepSeek-V3 icon indicating copy to clipboard operation
DeepSeek-V3 copied to clipboard

Possible Typo in ZB1P Pipeline Bubble Calculation Formula in DeepSeek-V3 Report

Open yzhblind opened this issue 1 year ago • 4 comments

In the DeepSeek-V3 report PDF, I noticed that on page 13, the total bubble for the ZB1P pipeline parallel method is described as (PP-1)(F+B-2W), whereas in the original Zero Bubble paper, the total bubble for the ZB-H1 method should be (PP-1)(F+B-W). Could this be a typo?

yzhblind avatar Feb 27 '25 05:02 yzhblind

I think ZB1P pipeline parallel method on page 13 is ZB-H2, because DualPipe's bubble size is less than all version of ZERO BUBBLE PIPELINE PARALLELISM

Onlybyuse avatar Mar 12 '25 08:03 Onlybyuse

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions!

github-actions[bot] avatar Apr 14 '25 00:04 github-actions[bot]

Hello @yzhblind In my view, the definition about B is different between DualPipe and ZB1P. One is the the full backward chunk including backward for weights and backward for inputs. The other only contains backward for inputs

Liu-Weijie avatar Apr 16 '25 02:04 Liu-Weijie

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions!

github-actions[bot] avatar May 19 '25 00:05 github-actions[bot]

false

github-actions[bot] avatar Jun 02 '25 00:06 github-actions[bot]