detcon-pytorch Why using LayerNorm in MLP?

Why using LayerNorm in MLP?

Open pUmpKin-Co opened this issue 3 years ago • 2 comments

Why using layernorm in MLP?

Dec 28 '21 07:12 pUmpKin-Co

I had some issues with the combination of SyncBatchNorm and EMA with distributed training so I just replaced it with LayerNorm as a workaround.

Nothing stopping you from changing it back to batch norm though.

Dec 28 '21 14:12 isaaccorley

Thanks for your reply.In my tiny expermient, I found that layernorm was slghtly worse than batchnorm.So I asked the reason for using layernorm.

Dec 28 '21 15:12 pUmpKin-Co