ClipBERT
ClipBERT copied to clipboard
Gradient overflow
Hi, Why did this repo output the "Gradient overflow"? I run msrrvt_qa task with 1 and 2 1080Ti GPU(s). Can this repo be achieved by a single GPU (1080Ti)? Thanks!
Hi @datar001, By default, we trained on 4-8 V100 GPUs. We have not tried training on a single 1080Ti. For this warning, it is expected due to the use of mixed precision (fp16).
One more piece of information. From our past experience, if the loss scaler stay above 1, the training should be steady.
@datar001 Have you solved this problem? I have the same problem when running on 8 RTX 3090 GPUs.
Met the same problem on 2X2080TI or 2XP6000. Training on 2080TI stayed longer but still failed after two epochs.