Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Are there any other layer norm functions, such as RMSNorm or DeepNorm
Are there any other layer norm functions, such as RMSNorm or DeepNorm