multiffn-nli
multiffn-nli copied to clipboard
-np.inf in mask_3d causes numerical instability ?
I have found using -np.inf in the inter-attention module (attend part) often leads to nan loss computations even with gradient clipping or very low learning rates. Replacing it with some large value like -1e18
helps my case.
Could it be because there is some error in masking before calculating attention scores?
I never found this problem, and I have tried the code with different datasets. Are you sure you provided the correct sentence sizes?
If you're sure this happens, could you provide more information on your training setup? Like data and some hyperparameters.