ClipBERT icon indicating copy to clipboard operation
ClipBERT copied to clipboard

Gradient overflow

Open datar001 opened this issue 3 years ago • 4 comments

Hi, Why did this repo output the "Gradient overflow"? I run msrrvt_qa task with 1 and 2 1080Ti GPU(s). Can this repo be achieved by a single GPU (1080Ti)? Thanks!

微信图片_20210821210453 微信图片_20210821211630

datar001 avatar Aug 21 '21 13:08 datar001

Hi @datar001, By default, we trained on 4-8 V100 GPUs. We have not tried training on a single 1080Ti. For this warning, it is expected due to the use of mixed precision (fp16).

jayleicn avatar Aug 21 '21 15:08 jayleicn

One more piece of information. From our past experience, if the loss scaler stay above 1, the training should be steady.

linjieli222 avatar Dec 07 '21 22:12 linjieli222

@datar001 Have you solved this problem? I have the same problem when running on 8 RTX 3090 GPUs.

peiswang avatar Dec 24 '21 06:12 peiswang

Met the same problem on 2X2080TI or 2XP6000. Training on 2080TI stayed longer but still failed after two epochs.

akira-l avatar Dec 13 '22 23:12 akira-l