Robust-Vision-Transformer
Robust-Vision-Transformer copied to clipboard
A question about "Loss is nan"
Is there any interface in the source code that would cause the loss is nan problem sometimes and the correct operation sometimes?
"loss is nan" is a problem for original ViT models when amp is turned on. Check here for more details.
Afterwards, many techniques are proposed to solve this problem, e.g., LayerScale. You can refer these techniques for preventing training loss to nan.