Huayang Li
Huayang Li
Hi @jamfly Yes, actually there are two solutions I think: 1. Detect the overflow operation step-by-step and address it. 2. Pretrain the ADP model for 3 epochs using FP32, and...
+ Have you tried using fp16 from scratch, will it turn lnf loss to the normal scale? Yes, I have tried FP16 from scratch with many hyper-parameters, e.g., different values...
To quickly fix this bug, you could add this line at the top of `main.py` ``` from prepro import * ``` Or you could reformat the code, such as putting...