deep-learning-with-python-notebooks 6.3 using LSTM layers instead of GRU layers gives nan, why?

When I use LSTM instead of GRU as the "Going even further" part suggests: The Stacked LSTM part loss and val loss both are nan: 2018-04-25 14-57-01 Why LSTM and GRU different so much, and the nan? When trying stacked LSTM on GPU, no longer nan, but very large number loss Why GPU and CPU result different so much. 2018-04-25 15-00-28 When change RMSProp to Adam on GPU, the loss change is strange too, such as 0.8** to 0.7** to 5*****.*** to 4*****.*** to very large number 2***********.**** 2018-04-25 15-14-06

Apr 25 '18 07:04 jingmouren

Hi Jingmouren,

I found the same problem with you. Have you solved the problem "nan" result? Thanks.

Best, Haowen

May 04 '20 07:05 hanyun2019

recurrent_dropout causes this

May 19 '20 11:05 birolkuyumcu