shuffle-bn has no effect on single-GPU

Open sorenrasmussenai opened this issue 5 years ago • 1 comments

It appears to me that shuffle-bn has no effect, when run on a single GPU.

Example:

import torch
import torch.nn as nn

(B,C,H,W) = 4,3,2,2

model1 = nn.Sequential(nn.BatchNorm2d(C))
model2 = nn.Sequential(nn.BatchNorm2d(C))
print("Before:")
print("  model1 stats: ", model1[0].running_mean, model1[0].running_var)
print("  model2 stats: ", model2[0].running_mean, model2[0].running_var)
shuffle_ids = torch.randperm(B).long()
x1 = torch.randn(B,C,H,W)*3+1
x2 = x1[shuffle_ids]
model1(x1)
model2(x2)
print("After:")
print("  model1 stats: ", model1[0].running_mean, model1[0].running_var)
print("  model2 stats: ", model2[0].running_mean, model2[0].running_var)

Before:
  model1 stats:  tensor([0., 0., 0.]) tensor([1., 1., 1.])
  model2 stats:  tensor([0., 0., 0.]) tensor([1., 1., 1.])
After:
  model1 stats:  tensor([0.2285, 0.1523, 0.1447]) tensor([1.6193, 1.4863, 1.6332])
  model2 stats:  tensor([0.2285, 0.1523, 0.1447]) tensor([1.6193, 1.4863, 1.6332])

I guess another approach is necessary on single-GPU. Any thoughts?

Thanks for releasing this code.

Feb 06 '20 10:02 sorenrasmussenai

The simplest solution would probably be to emulate the multi-gpu implementation in single GPU:

Shuffle batch
Split batch in N parts
Do N independent batchnorms
Gather parts
Unshuffle

Feb 06 '20 12:02 sorenrasmussenai