PhasedLSTM-Keras
PhasedLSTM-Keras copied to clipboard
Regression by PhasedLSTM with a gradient explosion
Hello,
When I used PhasedLSTM (PLSTM) to perform the regression (to find the correlation between an input sequence and an output sequence), I got "nan" in the weight , also the loss in the beginning of the first epoch, even I used gradient clipping.
The generated data for training: (little modified from https://fairyonice.github.io/Extract-weights-from-Keras's-LSTM-and-calcualte-hidden-and-cell-states.html)
The optimizer is as follows:
model.compile(loss="mean_squared_error", sample_weight_mode="temporal", optimizer = keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0))
After checked the weights in PLSTM layer, I found the values of timegate-kernel getting larger and larger, then the weights get to "nan". (The first two rows)
I changed to standard LSTM (other settings and learning rate [still 0.01] the same), the loss converges. Therefore, I traced the source code of PLSTM, considering the initialization of timegate_kernel matters, but stuck for a long time, having little progress.
I am wondering if anyone has the similar issue? Any suggestions to find the reason why the gradient get exploded is appreciated. The relevant code is at the link:
https://github.com/hnchang/Regression-with-PhasedLSTM/blob/master/reg_plstm.py
Much thanks, James
Hey James,
I am having a similar issue here. Two things that have worked for me:
- Reduce the learning rate (on schedule or manually)
- Use clipping gradients to prevent them from exploding https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/
I hope this helps.