transformer-xl icon indicating copy to clipboard operation
transformer-xl copied to clipboard

Sensitivity to initial weights causing NANs?

Open arvieFrydenlund opened this issue 6 years ago • 1 comments

Hi, I'm getting NAN values in the first forward pass of the model (in the first layer), generally caused by the first AC calculation. I'm wondering if this is an issue with the initial weights of the model? If so, any advice to help with this issue? I have made some changes to the model and this will help me determine if this is a known issue or if I have introduced a bug. Thanks.

arvieFrydenlund avatar Feb 06 '19 17:02 arvieFrydenlund

This seldom happens. With the given hyper-parameters, this actually should not happen. However, when div_val > 1, meaning reducing the word embedding dimensionality by div_val times for infrequent words, this could happen with low probability according to my experience. If this happens to you, try using div_val = 1 or using smaller initial weights by decreasing init_range or init_std. Hope this helps.

kimiyoung avatar Feb 06 '19 17:02 kimiyoung