DeepQA
DeepQA copied to clipboard
Lot of overfitting
the pretrained model is giving a lot of overftted outputs they don't make sense while chatting. can you suggest me anything to reduce it?
did you just run interactive
mode? I notice it doesn't train properly if you do that
Similar question for me at #139 , I think you have to run the non-interactive mode.. and THEN re-run with interactive
notice I omit --test interactive
and it goes into a different mode
~/Dev/git/Conchylicultor/DeepQA (master)$ ./main.py --modelTag pretrainedv2
Welcome to DeepQA v0.1 !
TensorFlow detected: v1.2.1
Loading dataset from ~/git/Conchylicultor/DeepQA/data/samples/dataset-cornell-length10-filter1-vocabSize40000.pkl
Loaded cornell: 24643 words, 159657 QA
Model creation...
2017-08-10 20:39:55.432684: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 20:39:55.432710: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 20:39:55.432715: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 20:39:55.432719: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Initialize variables...
WARNING: No previous model found, starting from clean directory: ~/git/Conchylicultor/DeepQA/save/model-pretrainedv2
Start training (press Ctrl+C to save and exit)...
----- Epoch 1/30 ; (lr=0.002) -----
Shuffling the dataset...
Training: 1%|▌ | 4/624 [00:25<1:05:31, 6.34s/it]
Hey Eric, i did train it again but on different system so it gave me correct responses. But i did manipulate it a bit since it was just giving vague answers most of the times. It's lstm model is not properly built so it doesn't give accurate responses.
@Mrinal18 , so I'm assuming.. if I run the command without --interactive
I'm essentially retraining the model ?? which is why the subsequent chats make more sense?
also could you elaborate your changes? did you just re-read the white paper and check the code itself?
@Mrinal18 You mind sharing the changes you've made to the model?
There seems to be no verification set in this code ? That's why it always overfitting.