torch-rnn
torch-rnn copied to clipboard
Error: error: wrote 0 blocks instead of 1
Hey folks, loving this package. From time to time I'll be running training and get the following error:
/home/ubuntu/torch-cl/install/bin/luajit: /home/ubuntu/torch-cl/install/share/lua/5.1/torch/File.lua:134: write error: wrote 0 blocks instead of 1 at /home/ubuntu/torch-cl/pkg/torch/lib/TH/THDiskFile.c:323
stack traceback:
[C]: in function 'writeInt'
/home/ubuntu/torch-cl/install/share/lua/5.1/torch/File.lua:134: in function 'writeObject'
/home/ubuntu/torch-cl/install/share/lua/5.1/torch/File.lua:226: in function 'writeObject'
/home/ubuntu/torch-cl/install/share/lua/5.1/torch/File.lua:226: in function 'writeObject'
/home/ubuntu/torch-cl/install/share/lua/5.1/torch/File.lua:379: in function 'save'
train.lua:242: in main chunk
[C]: in function 'dofile'
...u/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
Note that this problem is not reliably reproduced - the epoch it fails on seems to not have a pattern. One time it was at epoch 5, another time epoch 33 etc.
Note that I have definitely successfully done training sessions on other data before with no errors. So, I do know that torch-rnn is installed correctly and functional.
The data set is about ~7MB, all plaintext that has been successfully run through preprocess.py.
The hyperparameters/invocation I'm using:
th train.lua -input_h5 data/data.h5 -input_json data/data.json -model_type lstm -num_layers 2 -rnn_size 128 -seq_length 80 -dropout 0.5 -learning_rate 3e-3 -lr_decay_factor 0.8
I've been experimenting to purposefully obtain weird results with changing parameters, so the above invocation might raise some eyebrows here for what kind of model it would generate. ;)
I get this error as well:
/home/_/torch/install/bin/luajit: /home/_/torch/install/share/lua/5.1/torch/File.lua:210: write error: wrote 17069883 blocks instead of 28443465 at /tmp/luarocks_torch-scm-1-1419/torch7/lib/TH/THDiskFile.c:340 stack traceback: [C]: in function 'write' /home/_/torch/install/share/lua/5.1/torch/File.lua:210: in function </home/_/torch/install/share/lua/5.1/torch/File.lua:107> [C]: in function 'write' /home/_/torch/install/share/lua/5.1/torch/File.lua:210: in function 'writeObject' /home/_/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject' /home/_/torch/install/share/lua/5.1/nn/Module.lua:154: in function 'write' /home/_/torch/install/share/lua/5.1/torch/File.lua:210: in function 'writeObject' /home/_/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject' /home/_/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject' /home/_/torch/install/share/lua/5.1/nn/Module.lua:154: in function 'write' /home/_/torch/install/share/lua/5.1/torch/File.lua:210: in function 'writeObject' /home/_/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject' /home/_/torch/install/share/lua/5.1/torch/File.lua:388: in function 'save' train.lua:242: in main chunk [C]: in function 'dofile' ...than/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670