apex
apex copied to clipboard
FusedDenseGeluDense output NAN
apex FusedDenseGeluDense ouput nan
After some iterative training, the output is all nan; We checked that the input doesn't have nan
CC @seryilmaz
Any fixes on this? I also notice this to be the case. I noticed that when I am using fp16, the biases for the FusedDenseGeluDense layers were initialized to inf. This is what I think causes nans. Reinitializing the weights and biases seems to have fixed this so far.