PytorchInsight
PytorchInsight copied to clipboard
Is the operation in SGE-Block equivalent to GroupNorm ?
Hi. I have two questions:
Question 1:
t = t - t.mean(dim=1, keepdim=True)
std = t.std(dim=1, keepdim=True) + 1e-5
t = t / std
t = t.view(b, self.groups, h, w)
t = t * self.weight + self.bias
it this code equivalent to batchNorm (or GroupNorm) ? if so, shouldn't we use running_mean and running_var to stabilize the statistics and improve convergence ?
Question 2:
xn = xn.sum(dim=1, keepdim=True)
what it is logic behind this line ? why are summing along the groups ?
thanks a lot Tal
For question 2, I think it is used to reduce the weighted channels in each group to obtain the attention map $a$