torch-rnn GRU cells

I added the possibility to use GRU cells.

Mar 02 '16 19:03 guillitte

Wow, this looks amazing - thanks a bunch! There's even a unit test! I want to look through it in a bit more detail before merging, and I probably won't have time to do so today.

Mar 02 '16 19:03 jcjohnson

Thanks. It could certainly be further optimized, but, at least, it seems to work fine.

Mar 02 '16 20:03 guillitte

Any update on this?

May 04 '16 09:05 JoostvDoorn

For those interested, I also added a gridgru adapted from http://arxiv.org/abs/1507.01526 in the Dev branch

May 04 '16 13:05 guillitte

Running a small benchmark using 1000 iterations on tiny Shakespeare (Epoch 3.8), I got the following results :

LSTM :

{"i":1000,"val_loss_history":[1.6292053406889],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"lstm","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/lstm","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

GRU :

{"i":1000,"val_loss_history":[1.4681989658963],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gru","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/gru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

GRIDGRU :

{"i":1000,"val_loss_history":[1.4313773946329],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gridgru","lr_decay_every":5,"print_every":1,"wordvec_size":800,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/gridgru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

NB : for GRIDGRU, wordvec_size is the size of the network along depth, so it should be about the same as rnn_size

May 05 '16 09:05 guillitte

@guillitte I wonder how fair this comparison is. GRIDGRU has as twice as more parameters than LSTM, and 2.5 times more parameters, than GRU. 3x800 GRIDGRU has roughly the same amount of parameters as, say, 3x1070 LSTM or 3x1250 GRU. So, in this comparison, GRU wins hands down.

Nov 06 '16 14:11 AlekzNet

This has been open for a while, mind if one of the contributors merge this?

Feb 27 '19 22:02 binary-person

@scheng123 An equivalent implementation is also merged into https://github.com/torch/rnn/ with the name SeqGRU.

Feb 28 '19 06:02 JoostvDoorn

torch-rnn torch-rnn copied to clipboard

GRU cells

torch-rnn
torch-rnn copied to clipboard