MoChA-pytorch
MoChA-pytorch copied to clipboard
safe_cumprod still causes Nan grad
I tried this MonotonicAttention in my seq2seq model, which works well with vanilla attention, while after training for a while, it still encountered the Nan grad issue. I checked the parameters with Nan grad, which are all params before MonotonicAttention's output. I also deleted the "safe_cumprod" operation, and this works well. So I think there may be some problems. Does anyone tried MonotonicAttention, and what's your situation?
I have the same "NAN" problem, have you solved it?
Any one who has this problem, here is a line from tensorflow for Mocha implementation. attention = ( p_choose_i * cumprod_1mp_choose_i * tf.cumsum( previous_attention / # Clip cumprod_1mp to avoid divide-by-zero tf.clip_by_value(cumprod_1mp_choose_i, 1e-10, 1.0), axis=1, ) ) look at this line you get the solution. More hint is: tf.clip_by_value(cumprod_1mp_choose_i, 1e-10, 1.0), Give me a thumb up emoji if you found this useful