Huayang Li

Results 2 comments of Huayang Li

Hi @jamfly Yes, actually there are two solutions I think: 1. Detect the overflow operation step-by-step and address it. 2. Pretrain the ADP model for 3 epochs using FP32, and...

+ Have you tried using fp16 from scratch, will it turn lnf loss to the normal scale? Yes, I have tried FP16 from scratch with many hyper-parameters, e.g., different values...