Stéphane Guillitte comments

Results 8 comments of


                                            Stéphane Guillitte

RuntimeError: can't assign a str to a scalar value of type int

The code was written in python 3.5 and I suspect the problem comes from the differences in string handling between python 2.7 and 3.5

Code to load weights

This is the code I used to load the weight from numpy files :+1: embed.weight.data = torch.from_numpy(np.load("embd.npy")) rnn.h2o.weight.data = torch.from_numpy(np.load("w.npy")).t() rnn.h2o.bias.data = torch.from_numpy(np.load("b.npy")) rnn.layers[0].wx.weight.data = torch.from_numpy(np.load("wx.npy")).t() rnn.layers[0].wh.weight.data = torch.from_numpy(np.load("wh.npy")).t() rnn.layers[0].wh.bias.data...

Code to load weights

I haded the lm.py file allowing to retrain the model on new data. It was used to create the model and load the weights.

Code to load weights

Things are more complicated than this, because the tf model is using l2 regularization. Pytorch handles this differently. This is why I had to hack the tensorflow model to produce...

Code to load weights

Forgive me, it is not L2 regularization but weights normalization which is the problem. And yes, I extracted the variables with tf code.

GRU cells

Thanks. It could certainly be further optimized, but, at least, it seems to work fine.

GRU cells

For those interested, I also added a gridgru adapted from http://arxiv.org/abs/1507.01526 in the Dev branch

GRU cells

Running a small benchmark using 1000 iterations on tiny Shakespeare (Epoch 3.8), I got the following results : LSTM : {"i":1000,"val_loss_history":[1.6292053406889],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"lstm","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data\/tiny-shakespeare.json","num_layers":3,"input_h5":"data\/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv\/lstm","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0} GRU : {"i":1000,"val_loss_history":[1.4681989658963],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gru","lr_decay_every":5,"print_every":1,"wordvec_size":64,"seq_length":50,"input_json":"data\/tiny-shakespeare.json","num_layers":3,"input_h5":"data\/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv\/gru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0} GRIDGRU : {"i":1000,"val_loss_history":[1.4313773946329],"val_loss_history_it":[1000],"forward_backward_times":{},"opt":{"max_epochs":50,"checkpoint_every":1000,"batch_size":50,"memory_benchmark":0,"init_from":"","grad_clip":5,"model_type":"gridgru","lr_decay_every":5,"print_every":1,"wordvec_size":800,"seq_length":50,"input_json":"data\/tiny-shakespeare.json","num_layers":3,"input_h5":"data\/tiny-shakespeare.h5","reset_iterations":1,"rnn_size":800,"dropout":0,"checkpoint_name":"cv\/gridgru","batchnorm":0,"learning_rate":0.0005,"speed_benchmark":0,"gpu_backend":"cuda","lr_decay_factor":0.5,"gpu":0} NB : for...