opensphere weights normalization

Hello!

What's the reason to skip gradient computation during normalization of classifier weights?

with torch.no_grad():
     self.w.data = F.normalize(self.w.data, dim=0)

You've implemented all sphere losses in this way, so it's not a typo :) I think I'm missing something in your implementation; could you explain it?

Thanks you!

May 23 '22 08:05 mnikitin

Hello @mnikitin, This is not an answer, I have a question too :)) To my understanding, weight, geometrically, is the class center. But why it is always initialized from some distributions

self.w = nn.Parameter(torch.Tensor(feat_dim, num_class)) 
nn.init.xavier_normal_(self.w)

and never update weight again?

May 25 '22 08:05 tungdop2

@tungdop2 hello !

Xavier_normal is one of default choices to initialize conv and dense layers. Also, in authors' implementation weights of classifier are actually updating: https://github.com/ydwen/opensphere/blob/main/runner.py#L98

So, it looks that it's ok from this perspective. But I'm still not sure about grad-skipping during normalization.

May 25 '22 10:05 mnikitin

Tks @mnikitin, To your questions, I think we just optimize weight after normalizing and don't need to denormalize it. Its original value doesn't need to change, we don't care about it. Back to my question, so weight is updated by gradient descent? I don't really understand this step. Another version of insight face makes me so confused. https://github.com/deepinsight/insightface/blob/149ea0ffae5cda765102bd7c2d28e27429f828e8/recognition/arcface_torch/partial_fc.py#L138

May 26 '22 09:05 tungdop2

opensphere opensphere copied to clipboard

weights normalization

opensphere
opensphere copied to clipboard