pytorch-adacos icon indicating copy to clipboard operation
pytorch-adacos copied to clipboard

nan value

Open jayandral opened this issue 4 years ago • 4 comments

While trying to replicate adacos we find the B_avg tending to inf. can u help me with this.

m=0.5 B_avg value before inf = 8.3499e+35

Thanks

jayandral avatar Aug 02 '19 10:08 jayandral

Are you training with your own dataset? Can you tell me more details?

4uiiurz1 avatar Aug 03 '19 00:08 4uiiurz1

We are training with VGGFace2 dataset.

jayandral avatar Oct 06 '19 09:10 jayandral

I experienced this issue. It seems related to this other issue.

My fix is to change the optimizer from

optimizer = optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=args.lr,
            momentum=args.momentum, weight_decay=args.weight_decay)

to

from itertools import chain
optimizer = optim.SGD(filter(lambda p: p.requires_grad, chain(model.parameters(),metric_fc.parameters())),
                      lr=args.lr,momentum=args.momentum, weight_decay=args.weight_decay)

ahmdtaha avatar Sep 08 '20 11:09 ahmdtaha

I found another issue that raises the nan value.

The scale variable s should be updated during training only,i.e., using the training split. However, it is updated every time the forward method is called. Thus, it is currently updated using both the training and testing splits. I found that to raise nan value frequently. So I changed

with torch.no_grad():
     ........
     self.s = torch.log(B_avg) / torch.cos(torch.min(math.pi/4 * torch.ones_like(theta_med), theta_med))

to

if self.training:
     with torch.no_grad():
             ........
             self.s = torch.log(B_avg) / torch.cos(torch.min(math.pi/4 * torch.ones_like(theta_med), theta_med))

self.training is already defined inside AdaCos because it is nn.Module. So there is no need to define this variable.

ahmdtaha avatar Oct 28 '20 17:10 ahmdtaha