Is the operation in SGE-Block equivalent to GroupNorm ?

Open mrT23 opened this issue 5 years ago • 1 comments

Hi. I have two questions:

Question 1:

        t = t - t.mean(dim=1, keepdim=True)
        std = t.std(dim=1, keepdim=True) + 1e-5
        t = t / std
        t = t.view(b, self.groups, h, w)
        t = t * self.weight + self.bias

it this code equivalent to batchNorm (or GroupNorm) ? if so, shouldn't we use running_mean and running_var to stabilize the statistics and improve convergence ?

Question 2:
xn = xn.sum(dim=1, keepdim=True) what it is logic behind this line ? why are summing along the groups ?

thanks a lot Tal

Jan 30 '20 14:01 mrT23

For question 2, I think it is used to reduce the weighted channels in each group to obtain the attention map $a$

Aug 31 '24 13:08 Haus226