pytorch-adacos nan value

nan value

Open jayandral opened this issue 4 years ago • 4 comments

While trying to replicate adacos we find the B_avg tending to inf. can u help me with this.

m=0.5 B_avg value before inf = 8.3499e+35

Thanks

Aug 02 '19 10:08 jayandral

Are you training with your own dataset? Can you tell me more details?

Aug 03 '19 00:08 4uiiurz1

We are training with VGGFace2 dataset.

Oct 06 '19 09:10 jayandral

I experienced this issue. It seems related to this other issue.

My fix is to change the optimizer from

optimizer = optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=args.lr,
            momentum=args.momentum, weight_decay=args.weight_decay)

from itertools import chain
optimizer = optim.SGD(filter(lambda p: p.requires_grad, chain(model.parameters(),metric_fc.parameters())),
                      lr=args.lr,momentum=args.momentum, weight_decay=args.weight_decay)

Sep 08 '20 11:09 ahmdtaha

I found another issue that raises the nan value.

The scale variable s should be updated during training only,i.e., using the training split. However, it is updated every time the forward method is called. Thus, it is currently updated using both the training and testing splits. I found that to raise nan value frequently. So I changed

with torch.no_grad():
     ........
     self.s = torch.log(B_avg) / torch.cos(torch.min(math.pi/4 * torch.ones_like(theta_med), theta_med))

if self.training:
     with torch.no_grad():
             ........
             self.s = torch.log(B_avg) / torch.cos(torch.min(math.pi/4 * torch.ones_like(theta_med), theta_med))

self.training is already defined inside AdaCos because it is nn.Module. So there is no need to define this variable.

Oct 28 '20 17:10 ahmdtaha

pytorch-adacos pytorch-adacos copied to clipboard

nan value

pytorch-adacos
pytorch-adacos copied to clipboard