Possible Typo in ZB1P Pipeline Bubble Calculation Formula in DeepSeek-V3 Report
In the DeepSeek-V3 report PDF, I noticed that on page 13, the total bubble for the ZB1P pipeline parallel method is described as (PP-1)(F+B-2W), whereas in the original Zero Bubble paper, the total bubble for the ZB-H1 method should be (PP-1)(F+B-W). Could this be a typo?
I think ZB1P pipeline parallel method on page 13 is ZB-H2, because DualPipe's bubble size is less than all version of ZERO BUBBLE PIPELINE PARALLELISM
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions!
Hello @yzhblind In my view, the definition about B is different between DualPipe and ZB1P. One is the the full backward chunk including backward for weights and backward for inputs. The other only contains backward for inputs
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions!
false