detcon-pytorch
detcon-pytorch copied to clipboard
Why using LayerNorm in MLP?
Why using layernorm in MLP?
I had some issues with the combination of SyncBatchNorm and EMA with distributed training so I just replaced it with LayerNorm as a workaround.
Nothing stopping you from changing it back to batch norm though.
Thanks for your reply.In my tiny expermient, I found that layernorm was slghtly worse than batchnorm.So I asked the reason for using layernorm.