torch-rnn
torch-rnn copied to clipboard
GRU cells
I added the possibility to use GRU cells.
Wow, this looks amazing - thanks a bunch! There's even a unit test! I want to look through it in a bit more detail before merging, and I probably won't have time to do so today.
Thanks. It could certainly be further optimized, but, at least, it seems to work fine.
Any update on this?
For those interested, I also added a gridgru adapted from http://arxiv.org/abs/1507.01526 in the Dev branch
Running a small benchmark using 1000 iterations on tiny Shakespeare (Epoch 3.8), I got the following results :
LSTM :
{"i":1000,"val_loss_history":[1.6292053406889],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"lstm","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/lstm","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}
GRU :
{"i":1000,"val_loss_history":[1.4681989658963],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gru","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/gru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}
GRIDGRU :
{"i":1000,"val_loss_history":[1.4313773946329],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gridgru","lr_decay_every":5,"print_every":1,"wordvec_size":800,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/gridgru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}
NB : for GRIDGRU, wordvec_size is the size of the network along depth, so it should be about the same as rnn_size
@guillitte I wonder how fair this comparison is. GRIDGRU has as twice as more parameters than LSTM, and 2.5 times more parameters, than GRU. 3x800 GRIDGRU has roughly the same amount of parameters as, say, 3x1070 LSTM or 3x1250 GRU. So, in this comparison, GRU wins hands down.
This has been open for a while, mind if one of the contributors merge this?
@scheng123 An equivalent implementation is also merged into https://github.com/torch/rnn/ with the name SeqGRU.