torch-rnn icon indicating copy to clipboard operation
torch-rnn copied to clipboard

GRU cells

Open guillitte opened this issue 9 years ago • 8 comments

I added the possibility to use GRU cells.

guillitte avatar Mar 02 '16 19:03 guillitte

Wow, this looks amazing - thanks a bunch! There's even a unit test! I want to look through it in a bit more detail before merging, and I probably won't have time to do so today.

jcjohnson avatar Mar 02 '16 19:03 jcjohnson

Thanks. It could certainly be further optimized, but, at least, it seems to work fine.

guillitte avatar Mar 02 '16 20:03 guillitte

Any update on this?

JoostvDoorn avatar May 04 '16 09:05 JoostvDoorn

For those interested, I also added a gridgru adapted from http://arxiv.org/abs/1507.01526 in the Dev branch

guillitte avatar May 04 '16 13:05 guillitte

Running a small benchmark using 1000 iterations on tiny Shakespeare (Epoch 3.8), I got the following results :

LSTM :

{"i":1000,"val_loss_history":[1.6292053406889],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"lstm","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/lstm","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

GRU :

{"i":1000,"val_loss_history":[1.4681989658963],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gru","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/gru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

GRIDGRU :

{"i":1000,"val_loss_history":[1.4313773946329],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gridgru","lr_decay_every":5,"print_every":1,"wordvec_size":800,"seq_length":50,"input_json":"data/tiny-shakespeare.json","num_layers":3,"input_h5":"data/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv/gridgru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0}

NB : for GRIDGRU, wordvec_size is the size of the network along depth, so it should be about the same as rnn_size

guillitte avatar May 05 '16 09:05 guillitte

@guillitte I wonder how fair this comparison is. GRIDGRU has as twice as more parameters than LSTM, and 2.5 times more parameters, than GRU. 3x800 GRIDGRU has roughly the same amount of parameters as, say, 3x1070 LSTM or 3x1250 GRU. So, in this comparison, GRU wins hands down.

AlekzNet avatar Nov 06 '16 14:11 AlekzNet

This has been open for a while, mind if one of the contributors merge this?

binary-person avatar Feb 27 '19 22:02 binary-person

@scheng123 An equivalent implementation is also merged into https://github.com/torch/rnn/ with the name SeqGRU.

JoostvDoorn avatar Feb 28 '19 06:02 JoostvDoorn