Yiluan-Motional

Results 1 comments of Yiluan-Motional

@mcarilli I found one case where we might need min_loss_scale. In my training with AMP, the first several iterations have NaN gradient quite often. Thus the first usable scaling value...