CMC
CMC copied to clipboard
shuffle-bn has no effect on single-GPU
It appears to me that shuffle-bn has no effect, when run on a single GPU.
Example:
import torch
import torch.nn as nn
(B,C,H,W) = 4,3,2,2
model1 = nn.Sequential(nn.BatchNorm2d(C))
model2 = nn.Sequential(nn.BatchNorm2d(C))
print("Before:")
print(" model1 stats: ", model1[0].running_mean, model1[0].running_var)
print(" model2 stats: ", model2[0].running_mean, model2[0].running_var)
shuffle_ids = torch.randperm(B).long()
x1 = torch.randn(B,C,H,W)*3+1
x2 = x1[shuffle_ids]
model1(x1)
model2(x2)
print("After:")
print(" model1 stats: ", model1[0].running_mean, model1[0].running_var)
print(" model2 stats: ", model2[0].running_mean, model2[0].running_var)
Before:
model1 stats: tensor([0., 0., 0.]) tensor([1., 1., 1.])
model2 stats: tensor([0., 0., 0.]) tensor([1., 1., 1.])
After:
model1 stats: tensor([0.2285, 0.1523, 0.1447]) tensor([1.6193, 1.4863, 1.6332])
model2 stats: tensor([0.2285, 0.1523, 0.1447]) tensor([1.6193, 1.4863, 1.6332])
I guess another approach is necessary on single-GPU. Any thoughts?
Thanks for releasing this code.
The simplest solution would probably be to emulate the multi-gpu implementation in single GPU:
- Shuffle batch
- Split batch in N parts
- Do N independent batchnorms
- Gather parts
- Unshuffle