Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] why WrappedTorchLayerNorm sequence parallel not supported by torch LayerNorm?

Open mollon650 opened this issue 9 months ago • 4 comments

@cuichenx just like TENorm add # Set flag for sequence parallelism (custom Megatron-LM integration) if getattr(self, "sequence_parallel", None) is not None: self.weight.sequence_parallel = self.sequence_parallel self.bias.sequence_parallel = self.sequence_parallel to WrappedTorchLayerNorm class can support sequence parallel ?

mollon650 avatar Mar 05 '25 06:03 mollon650

No, changing the wrapper alone would not work. You would need the underlying implementation to support sequence parallelism

cuichenx avatar Mar 05 '25 16:03 cuichenx

@cuichenx why changing the wrapper alone would not work? RMSNorm (Root Mean Square Normalization) does not operate across tokens; rather, it normalizes independently for each token. Specifically, RMSNorm is applied across the hidden (feature) dimension for each token separately. so RMSNorm is naturally compatible with sequence parallelism, as each device can compute RMSNorm locally without needing synchronization or collective communication. i reference Tenorm code

mollon650 avatar Mar 06 '25 02:03 mollon650

@cuichenx any suggestion?

mollon650 avatar Mar 11 '25 02:03 mollon650

Marking as stale. No activity in 60 days.

github-actions[bot] avatar May 10 '25 18:05 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Jul 28 '25 02:07 github-actions[bot]