Stand-Alone-Self-Attention
Stand-Alone-Self-Attention copied to clipboard
Loss is NaN
Hello,
I am testing your Resnet50 model with stem is True and at the first training step, my loss is NaN and the accuracy is decreasing? Is that a bug?
Also I didn't see this problem when I train the model ResNet 26.
Thanks for your comments. I don't have enough GPUs. So, I couldn't experiments all of ResNet model. Maybe, you can reduce learning_rate. example) 0.01
Thank you !
Hi, I am facing issues with the Resnet50 model training on CIFAR-10. Even with lr of 0.01 it's throwing Nan after around 10 epochs (suddenly), so, I am not quite sure how to train the resnet50 model. Hoping for a quick reply! Thanks.
Just as a note, the resnet38 and 26 did run successfully without Nan.