opensphere icon indicating copy to clipboard operation
opensphere copied to clipboard

weights normalization

Open mnikitin opened this issue 2 years ago • 4 comments

Hello!

What's the reason to skip gradient computation during normalization of classifier weights?

with torch.no_grad():
     self.w.data = F.normalize(self.w.data, dim=0)

You've implemented all sphere losses in this way, so it's not a typo :) I think I'm missing something in your implementation; could you explain it?

Thanks you!

mnikitin avatar May 23 '22 08:05 mnikitin

Hello @mnikitin, This is not an answer, I have a question too :)) To my understanding, weight, geometrically, is the class center. But why it is always initialized from some distributions

self.w = nn.Parameter(torch.Tensor(feat_dim, num_class)) 
nn.init.xavier_normal_(self.w)

and never update weight again?

tungdop2 avatar May 25 '22 08:05 tungdop2

@tungdop2 hello !

Xavier_normal is one of default choices to initialize conv and dense layers. Also, in authors' implementation weights of classifier are actually updating: https://github.com/ydwen/opensphere/blob/main/runner.py#L98

So, it looks that it's ok from this perspective. But I'm still not sure about grad-skipping during normalization.

mnikitin avatar May 25 '22 10:05 mnikitin

Tks @mnikitin, To your questions, I think we just optimize weight after normalizing and don't need to denormalize it. Its original value doesn't need to change, we don't care about it. Back to my question, so weight is updated by gradient descent? I don't really understand this step. Another version of insight face makes me so confused. https://github.com/deepinsight/insightface/blob/149ea0ffae5cda765102bd7c2d28e27429f828e8/recognition/arcface_torch/partial_fc.py#L138

tungdop2 avatar May 26 '22 09:05 tungdop2