Yiluan-Motional
Results
1
comments of
Yiluan-Motional
@mcarilli I found one case where we might need min_loss_scale. In my training with AMP, the first several iterations have NaN gradient quite often. Thus the first usable scaling value...