Associative_LSTM
Associative_LSTM copied to clipboard
Convergence rate?
Hi,
Thank you for releasing your implementation! I'm working on testing the model in Torch, and am having difficulty replicating results from the original paper, so I wanted to compare to your implementation to make sure there isn't some bug in my implementation (I see much better results from a naive LSTM, even without forget gate initialization, than reported than in the paper, and the Associative LSTM seems to have comparable performance per number of updates).
I have a few questions:
-
I'm having trouble getting the loss down to .03 as in the figure in the readme (copy-pasted my log file below, and was just wondering if this look usual / how long it should take for the loss to reach 0.03).
-
Were there any implementation tricks you found important for getting the model to train well? I've zero'd the h to u connections as in the paper, and a single copy of the HRR memory. I looked at your code and didn't see anything obviously different, but it's possible I'm missing something, and I'm wondering if any obvious possibilities stand out.
Thank you
Logfile:
iterations_done:0
iterations_done:1000
train_CE:0.370743
iterations_done:2000
train_CE:0.174449
iterations_done:3000
train_CE:0.168744
iterations_done:4000
train_CE:0.156446
iterations_done:5000
train_CE:0.141770
iterations_done:6000
train_CE:0.126022
iterations_done:7000
train_CE:0.116777
iterations_done:8000
train_CE:0.108515
iterations_done:9000
train_CE:0.100494
iterations_done:10000
train_CE:0.094278
iterations_done:11000
train_CE:0.089932
iterations_done:12000
train_CE:0.083656
iterations_done:13000
train_CE:0.079061
iterations_done:14000
train_CE:0.071482
iterations_done:15000
train_CE:0.175424
iterations_done:16000
train_CE:0.175366
iterations_done:17000
train_CE:0.174720
iterations_done:18000
train_CE:0.174372
iterations_done:19000
train_CE:0.178491
iterations_done:20000
train_CE:0.173044