Vim icon indicating copy to clipboard operation
Vim copied to clipboard

Loss is nan, stopping training

Open JunLiangZ opened this issue 1 year ago • 11 comments

During the training process, the problem of loss being nan occurred. Why is this?

JunLiangZ avatar Feb 27 '24 02:02 JunLiangZ

During the training process, the problem of loss being nan occurred. Why is this?

我也出现了这个问题,请问你解决了吗

jasscia18 avatar Mar 01 '24 02:03 jasscia18

Maybe try Float32 and reduce learning rate, BF16 can suffer from some stability issue.

radarFudan avatar Mar 03 '24 13:03 radarFudan

保证--if_amp为False,似乎能解决这个问题。(Try setting --if_amp to False)

zhenyuZ-HUST avatar Mar 12 '24 12:03 zhenyuZ-HUST

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

sailor-z avatar Apr 18 '24 13:04 sailor-z

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

BranStarkkk avatar May 15 '24 15:05 BranStarkkk

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

sailor-z avatar May 16 '24 01:05 sailor-z

set AMP=False may work or just set lower lr

CacatuaAlan avatar May 17 '24 05:05 CacatuaAlan

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

I just change the backbone Vim to other vision mamba model, and it works... Its name is VMamba.

BranStarkkk avatar May 17 '24 07:05 BranStarkkk

Hi, if_amp = False doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

I just change the backbone Vim to other vision mamba model, and it works... Its name is VMamba.

Thanks for the information! I'll look into it.

sailor-z avatar May 17 '24 09:05 sailor-z

Got same problem, fixed by dividing the sum of forward/backward hidden states by 2 to make hidden states/residuals of all layers have similar magnitude. Check out the detail: https://github.com/hustvl/Vim/pull/90

mdchuc avatar May 30 '24 00:05 mdchuc

@mdchuc, do you have any idea why, in code, they are flipping the out_b across the dim=-1? Shouldn't it be dim = 1?

Karn3003 avatar Nov 10 '24 11:11 Karn3003