Chatbot-AI
Chatbot-AI copied to clipboard
Killed
When attempting to train the program will run and the post "Killed"
th train.lua --dataset 50000 --hiddenSize 1000 -- Loading dataset data/vocab.t7 not found -- Parsing Cornell movie dialogs data set ... [==================== 387810/387810 ==========>] Tot: 3s942ms | Step: 0ms -- Pre-processing data [==================== 50000/50000 ============>] Tot: 33s312ms | Step: 0ms -- Removing low frequency words [==================== 83632/83632 ============>] Tot: 12s831ms | Step: 0ms Writing data/examples.t7 ... [==================== 83632/83632 ============>] Tot: 28s333ms | Step: 0ms Writing data/vocab.t7 ...
Dataset stats: Vocabulary size: 25931 Examples: 83632 Killed
Running the basic readme demo
Tried re-running with a smaller data set and now get this
-- Epoch 1 / 50
/home/ubuntu/torch/install/bin/luajit: ...u/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table stack traceback: [C]: in function 'assert' ...u/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward' ./seq2seq.lua:74: in function 'train' train.lua:85: in main chunk [C]: in function 'dofile' ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0000cff9
@bobhinkle could be related to this dependency issue, he posts a solution https://github.com/llSourcell/Chatbot-AI/issues/1 if not that it could be a memory overflow? if you are running this locally, i suggest running it on AWS. See ML for Hackers #4 for an AWS walkthrough.
@llSourcell I had the same issue and got expecting target table
.
I don't think it's related to the dependencies because I tried with and without CUDA:
-
th train.lua --cuda --dataset 500 --hiddenSize 100 --maxEpoch 10 --saturateEpoch 4
-
th train.lua --dataset 500 --hiddenSize 100 --maxEpoch 10 --saturateEpoch 4
That stuff about dependencies was mainly about when running with OpenCL.
I'm running on a CentOS machine with Lua 5.3.3. I also tried with Lua 5.1.4.
Temporary solution: Comment out the failing asserts in SequencerCriterion.lua (in ~/torch/install/share/lua/5.2/rnn/SequencerCriterion.lua
for me because I installed Torch with TORCH_LUA_VERSION=LUA52 ./install.sh
by default you'll find it at ~/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua
).
Those checks on target
don't really seem that necessary to us.
I trained it pretty quickly using th train.lua --dataset 500 --hiddenSize 100 --maxEpoch 10 --saturateEpoch 4
and it works but the answers aren't that good, hopefully that just because of my constraints and not because something else went wrong.
@llSourcell maybe changing the type of decoderTarget
would be a better fix?
@juharris thanks so much for your posts on here with issues. I now have 7 ML for hackers repos to maintain with much more content to come so I may need some help with this issue. The quick fix you posted, could you make a PR with it? I would really appreciate. I'll merge it immediately
@llSourcell The fix isn't in this repo. The fix is the rnn
package. Looks like they'll have a fix coming in the original repo: https://github.com/macournoyer/neuralconvo/issues/31