contrastive-learner
contrastive-learner copied to clipboard
Does this handle global batch norm?
The SimCLR states the importance of global batch norm:
In distributed training with data parallelism, the BN mean and variance are typically aggregated locally per device. In our contrastive learning, as positive pairs are computed in the same device, the model can exploit the local information leakage to improve prediction accuracy without improving representations. We address this issue > by aggregating BN mean and variance over all devices during the training
Does this implementation handle that?