opensphere
opensphere copied to clipboard
weights normalization
Hello!
What's the reason to skip gradient computation during normalization of classifier weights?
with torch.no_grad():
self.w.data = F.normalize(self.w.data, dim=0)
You've implemented all sphere losses in this way, so it's not a typo :) I think I'm missing something in your implementation; could you explain it?
Thanks you!
Hello @mnikitin, This is not an answer, I have a question too :)) To my understanding, weight, geometrically, is the class center. But why it is always initialized from some distributions
self.w = nn.Parameter(torch.Tensor(feat_dim, num_class))
nn.init.xavier_normal_(self.w)
and never update weight again?
@tungdop2 hello !
Xavier_normal is one of default choices to initialize conv and dense layers. Also, in authors' implementation weights of classifier are actually updating: https://github.com/ydwen/opensphere/blob/main/runner.py#L98
So, it looks that it's ok from this perspective. But I'm still not sure about grad-skipping during normalization.
Tks @mnikitin, To your questions, I think we just optimize weight after normalizing and don't need to denormalize it. Its original value doesn't need to change, we don't care about it. Back to my question, so weight is updated by gradient descent? I don't really understand this step. Another version of insight face makes me so confused. https://github.com/deepinsight/insightface/blob/149ea0ffae5cda765102bd7c2d28e27429f828e8/recognition/arcface_torch/partial_fc.py#L138