Avoid calling set_save_original_input with FP8 delayed scaling
This pull request fixes a missing condition in the FP8 delayed scaling check related to set_save_original_input().
When FP8 delayed scaling is enabled (--fp8-recipe 'delayed'), set_save_original_input() function should not be called, but the necessary condition was accidentally omitted in commit 08814e8 (ADLR/megatron-lm!4030 - perf(MoE): Support recomputation for FP8 layernorm/moe_act/shared_experts).
This PR adds the missing condition to ensure the correct behavior, and fixes an "AssertionError: DelayedScaling recipe is not supported with save_original_input" error in core_v0.14.0 released version.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
/ok to test ea52007
/ok to test 775f386
/ok to test 3465f3f
/ok to test a5f5cd5
Thank you for your contribution!
NVIDIA Megatron-LM is currently transitioning to development on Github. We will aim to review your PR after we complete our transition and stabilize our Github development process.
Thank you for your understanding.