Stéphane Guillitte

Results 8 comments of Stéphane Guillitte

The code was written in python 3.5 and I suspect the problem comes from the differences in string handling between python 2.7 and 3.5

This is the code I used to load the weight from numpy files :+1: embed.weight.data = torch.from_numpy(np.load("embd.npy")) rnn.h2o.weight.data = torch.from_numpy(np.load("w.npy")).t() rnn.h2o.bias.data = torch.from_numpy(np.load("b.npy")) rnn.layers[0].wx.weight.data = torch.from_numpy(np.load("wx.npy")).t() rnn.layers[0].wh.weight.data = torch.from_numpy(np.load("wh.npy")).t() rnn.layers[0].wh.bias.data...

I haded the lm.py file allowing to retrain the model on new data. It was used to create the model and load the weights.

Things are more complicated than this, because the tf model is using l2 regularization. Pytorch handles this differently. This is why I had to hack the tensorflow model to produce...

Forgive me, it is not L2 regularization but weights normalization which is the problem. And yes, I extracted the variables with tf code.

Thanks. It could certainly be further optimized, but, at least, it seems to work fine.

For those interested, I also added a gridgru adapted from http://arxiv.org/abs/1507.01526 in the Dev branch

Running a small benchmark using 1000 iterations on tiny Shakespeare (Epoch 3.8), I got the following results : LSTM : {"i":1000,"val_loss_history":[1.6292053406889],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"lstm","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data\/tiny-shakespeare.json","num_layers":3,"input_h5":"data\/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv\/lstm","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0} GRU : {"i":1000,"val_loss_history":[1.4681989658963],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gru","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data\/tiny-shakespeare.json","num_layers":3,"input_h5":"data\/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv\/gru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0} GRIDGRU : {"i":1000,"val_loss_history":[1.4313773946329],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gridgru","lr_decay_every":5,"print_every":1,"wordvec_size":800,"seq_length":50,"input_json":"data\/tiny-shakespeare.json","num_layers":3,"input_h5":"data\/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv\/gridgru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0} NB : for...