Megatron-LM Avoid calling set_save_original_input with FP8 delayed scaling

This pull request fixes a missing condition in the FP8 delayed scaling check related to set_save_original_input().

When FP8 delayed scaling is enabled (--fp8-recipe 'delayed'), set_save_original_input() function should not be called, but the necessary condition was accidentally omitted in commit 08814e8 (ADLR/megatron-lm!4030 - perf(MoE): Support recomputation for FP8 layernorm/moe_act/shared_experts).

This PR adds the missing condition to ensure the correct behavior, and fixes an "AssertionError: DelayedScaling recipe is not supported with save_original_input" error in core_v0.14.0 released version.

Oct 14 '25 07:10 dalgarak

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Oct 14 '25 07:10 copy-pr-bot[bot]

/ok to test ea52007

Dec 01 '25 01:12 yaox12

/ok to test 775f386

Dec 02 '25 05:12 yaox12

/ok to test 3465f3f

Dec 05 '25 01:12 yaox12

/ok to test a5f5cd5

Dec 05 '25 01:12 yaox12

Thank you for your contribution!

NVIDIA Megatron-LM is currently transitioning to development on Github. We will aim to review your PR after we complete our transition and stabilize our Github development process.

Thank you for your understanding.

Dec 05 '25 01:12 github-actions[bot]