Shreyansh Das

Results 4 comments of Shreyansh Das

@anxiangsir I have tried with the exact same config (apart from batch_size and lr) and it still does not seem to converge.

Hi @anxiangsir , with our batch size (1500x4) using AdamW with a LR of 0.001 results in NaNs in the loss. We are using 4 80GB A100s and it is...

Hi @anxiangsir I observed that the ViT architectures converge when we use a CosFace margin (of 0.4), but do not converge (stagnate at a loss of about 20) when ArcFace...

Thanks @anxiangsir, I will try this out. Can you also share any insights you have on why ViT architectures do not converge when ArcFace loss is used? They seem to...