Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Test different layer norm
Script to reproduce diverging layer_norm weights