MoChA-pytorch safe_cumprod still causes Nan grad

safe_cumprod still causes Nan grad

Open LiNaihan opened this issue 5 years ago • 2 comments

I tried this MonotonicAttention in my seq2seq model, which works well with vanilla attention, while after training for a while, it still encountered the Nan grad issue. I checked the parameters with Nan grad, which are all params before MonotonicAttention's output. I also deleted the "safe_cumprod" operation, and this works well. So I think there may be some problems. Does anyone tried MonotonicAttention, and what's your situation?

Jan 05 '20 07:01 LiNaihan

I have the same "NAN" problem, have you solved it?

Feb 26 '20 01:02 zqma2

Any one who has this problem, here is a line from tensorflow for Mocha implementation. attention = ( p_choose_i * cumprod_1mp_choose_i * tf.cumsum( previous_attention / # Clip cumprod_1mp to avoid divide-by-zero tf.clip_by_value(cumprod_1mp_choose_i, 1e-10, 1.0), axis=1, ) ) look at this line you get the solution. More hint is: tf.clip_by_value(cumprod_1mp_choose_i, 1e-10, 1.0), Give me a thumb up emoji if you found this useful

Jul 16 '20 15:07 lai-agent-m

MoChA-pytorch MoChA-pytorch copied to clipboard

safe_cumprod still causes Nan grad

MoChA-pytorch
MoChA-pytorch copied to clipboard