DeepSpeed
DeepSpeed copied to clipboard
flops_profiler: add option recompute_fwd for the case of activation c…
…heckpoint
When activation checkpointing is enabled, most of forward is re-computed, and so the FLOPS calculation should be updated.
In this PR, we just suppose all fwd is re-computed as an estimation.
I don't find a way to pass the option from model script to deepspeed engine, and so add option directly for flops_profiler.
with this PR, if the recompute_fwd is true, the output changes from
bwd FLOPS per GPU = 2 * fwd flops per GPU / bwd latency: ... TFLOPS fwd+bwd FLOPS per GPU = 3 * fwd flops per GPU / (fwd+bwd latency): ... TFLOPS FLOPS per GPU = 3 * fwd flops per GPU / iter latency: ... TFLOPS
to
bwd FLOPS per GPU = 3 * fwd flops per GPU / bwd latency: ... TFLOPS fwd+bwd FLOPS per GPU = 4 * fwd flops per GPU / (fwd+bwd latency): ... TFLOPS FLOPS per GPU = 4 * fwd flops per GPU / iter latency: ... TFLOPS
It is more accurate with this PR.
just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the cases such as seq parallel in megatron-lm.
for the current implementation in megatron-deepspeed, we can set recompute_fwd_factor as 1.0.
comment are welcome, thanks.
any comment? thanks.
@jeffra @tjruwase @cli99 any comment? thanks.
just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the cases such as seq parallel in megatron-lm.
for the current implementation in megatron-deepspeed, we can set recompute_fwd_factor as 1.0.
comment are welcome, thanks.
plan to change to this idea if no other comment, thanks.
just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in the cases such as seq parallel in megatron-lm. for the current implementation in megatron-deepspeed, we can set recompute_fwd_factor as 1.0. comment are welcome, thanks.
plan to change to this idea if no other comment, thanks.
recompute_fwd_factor is used in the latest commit, could you please take a review, thanks. @jeffra @tjruwase @cli99
@guoyejun thanks, recompute_fwd_factor is good to have.