Stand-Alone-Self-Attention
Stand-Alone-Self-Attention copied to clipboard
Can anyone train resnet50 successfully without NaN
Hi, I am facing issues with the Resnet50 model training on CIFAR-10. Even with lr of 0.01 its throwing Nan after around 10 epochs (suddenly), so, I am not quite sure how to train the resnet50 model. Hoping for a quick reply! Thanks.
I am also having the same issue. Did you solve it yet?
add BN for generated Q, K, V
Can you elaborate @theFoxofSky ?