PhasedLSTM-Keras icon indicating copy to clipboard operation
PhasedLSTM-Keras copied to clipboard

Regression by PhasedLSTM with a gradient explosion

Open hnchang opened this issue 4 years ago • 1 comments

Hello,

When I used PhasedLSTM (PLSTM) to perform the regression (to find the correlation between an input sequence and an output sequence), I got "nan" in the weight , also the loss in the beginning of the first epoch, even I used gradient clipping.

The generated data for training: (little modified from https://fairyonice.github.io/Extract-weights-from-Keras's-LSTM-and-calcualte-hidden-and-cell-states.html)

training_partial_samples

The optimizer is as follows: model.compile(loss="mean_squared_error", sample_weight_mode="temporal", optimizer = keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0))

After checked the weights in PLSTM layer, I found the values of timegate-kernel getting larger and larger, then the weights get to "nan". (The first two rows)

large_timegate_weights

I changed to standard LSTM (other settings and learning rate [still 0.01] the same), the loss converges. Therefore, I traced the source code of PLSTM, considering the initialization of timegate_kernel matters, but stuck for a long time, having little progress.

I am wondering if anyone has the similar issue? Any suggestions to find the reason why the gradient get exploded is appreciated. The relevant code is at the link:

https://github.com/hnchang/Regression-with-PhasedLSTM/blob/master/reg_plstm.py

Much thanks, James

hnchang avatar Apr 17 '20 10:04 hnchang

Hey James,

I am having a similar issue here. Two things that have worked for me:

  1. Reduce the learning rate (on schedule or manually)
  2. Use clipping gradients to prevent them from exploding https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/

I hope this helps.

ntlex avatar May 13 '20 09:05 ntlex