DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Remove redundant layer norm operation

Open owmohamm opened this issue 4 years ago • 2 comments

In the pre-layernorm version of BERT, the application of layernorm on the embeddings is redundant since it is applied by the first transformer layer as well.

For reference: https://github.com/NVIDIA/Megatron-LM/blob/19301985dd31c8b612095cbad15bd903e8ddd497/megatron/model/language_model.py#L165

owmohamm avatar Dec 05 '20 22:12 owmohamm

I don't have permissions to merge the pull request. So could someone else from the deepspeed team do that?

owmohamm avatar Dec 08 '20 17:12 owmohamm

Hi Owais,

Thanks again for pointing this possible bug in the deepspeed example. We are discussing this in the team and will merge it soon if there is no accuracy impact!

Thanks, Reza

RezaYazdaniAminabadi avatar Dec 08 '20 18:12 RezaYazdaniAminabadi