Robust-Vision-Transformer A question about "Loss is nan"

A question about "Loss is nan"

Open yafangna opened this issue 1 year ago • 1 comments

Is there any interface in the source code that would cause the loss is nan problem sometimes and the correct operation sometimes?

Oct 11 '22 07:10 yafangna

"loss is nan" is a problem for original ViT models when amp is turned on. Check here for more details.

Afterwards, many techniques are proposed to solve this problem, e.g., LayerScale. You can refer these techniques for preventing training loss to nan.

Oct 12 '22 01:10 vtddggg