Does this handle global batch norm?

Open vedantroy opened this issue 3 years ago • 0 comments

The SimCLR states the importance of global batch norm:

In distributed training with data parallelism, the BN mean and variance are typically aggregated locally per device. In our contrastive learning, as positive pairs are computed in the same device, the model can exploit the local information leakage to improve prediction accuracy without improving representations. We address this issue > by aggregating BN mean and variance over all devices during the training

Does this implementation handle that?

Aug 23 '22 16:08 vedantroy