neuralconvo
neuralconvo copied to clipboard
multilayer support, dataset improvments and more
Hi,
The PR is pretty big and when I rebased I encountered some conflicts.. I decided to comment out the LR decay code since adam is supposed to handle it, so your consideration when merging..
some of the new features:
- multilayers lstm
- enhancements to dataset handling, allow to set vocab size, shuffle before every epoch, validation set, load from csv
- change to seqlstm - allows to double the network size ( i'm training on a single 4G GPU 4 layers of 1024 units in each side with 10k vocab )
- dropout
- l2 reg /weight decay
- early stop on validation/training
I also fixed some major bugs with perplexity calculation (with the help of @vikram-gupta), and bugs with memory efficiency.
Awesome work as always! I'm currently running /w th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 30000 --vocabSize 10000
and will update with results.
Results after th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 30000 --vocabSize 10000
for 50 epoch:
Epoch stats:
Errors: min= 1.298625054104
max= 3.8777817894112
median= 2.3050528590272
mean= 2.3216041151624
std= 0.32521225826956
ppl= 10.192010358467
val loss= 5.8125658944054
val ppl= 334.47625630575
The val ppl increased after each epoch (started from 121)
Eval:
you> hi
neuralconvo> I'm not sure you're not going to be a little.
you> what's your name?
neuralconvo> I'm not sure you're not going to be a little.
you> how old are you?
neuralconvo> I'm not sure you're not going to be a little.
I'm not sure if it's the eval code that is broken or the model. I've had similar issues too when I switched to SeqLSTM (in seqlstm branch).
Will try re-training w/ a single layer.
Hi, I think the problem is the small dataset you are using, only 50k examples. try the full set - I get to ppl 30 on val this way. The answers will tend to be generic when early stopping on the validation set, you can try to overfit the training data like before with the flag --earlyStopOnTrain
Even if it overfits the data, don't you find it suspect that the eval always returns the same exact output?
On master, when evaluating, I get a different output for every input even w/ small datasets. But w/ this one change I got similar behaviour (always same output). So I'm suspecting it's SeqLSTM.
I'm re-running the training w/ the full dataset and 15k vocab. I'll post results as soon as I got a couple epoch done.
You are right, it seems like even when the model should memorize the dataset it still gives the same response every time.. I'll investigate further and update you soon.
I am also getting the same responses when training with the following params -
th train.lua --cuda --hiddenSize 1000 --numLayers 2 --dataset 0 --batchSize 5
Ran one more experiment with only one layer (50 epochs) and am getting same response :(
th train.lua --cuda --hiddenSize 1000 --numLayers 1 --dataset 0 --batchSize 5
using this settings - (takes less than an hour to start seeing results) th train.lua --batchSize 128 --hiddenSize 512 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000 I managed to overfit a model that responds differently to inputs.
It do however seems like it takes more time to establish communication between the encoder and the decoder, and the model works mostly as a language model in the first epochs.
Hi @macournoyer, @vikram-gupta , I added a commit that turn off seqLSTM by default(use LSTM instead) and allows to switch it back on using the flag --seqLstm. My experiments show similar results using LSTM/SeqLSTM with the same number of units. I think the lack of variety in the answers originates from the regularisation we introduced (dropout + wd) also, some of the papers acknowledge this issue with those kinds models - check http://arxiv.org/abs/1510.03055
@chenb67 thx for the fix and the paper! Will check it out.
I'm re-training w/ this and will see.
Thanks @chenb67
I trained the models with the following params. Note that, i used --seqLSTM flag because the code was crashing during evaluation as we are converting the input to table.
th train.lua --batchSize 64 --hiddenSize 1000 --cuda --numLayers 1 --vocabSize 10000 --dropout 0 --weightDecay 0 --earlyStopOnTrain --dataset 100000 --seqLstm
The results have improved but we still have something more to do before they are as good as @macournoyer reported initially. Its surprising that even after nullifying almost all of the changes, the results are still not same as before. @macournoyer any clues?
you> how are you? neuralconvo> Oh, you met him... you> where are you? neuralconvo> In your place? you> what is your name? neuralconvo> You're talking about the precogs... you> how old are you? neuralconvo> You're talking about the precogs... you> where do you live? neuralconvo> I'm just an eye out. you> are you intelligent? neuralconvo> Yes, sir. you> are you a bot? neuralconvo> But don't you remember? you> are you hungry? neuralconvo> Oh, you met him... you> hello neuralconvo> You're talking about the precogs...
After 50 epochs, these were the stats -
Errors:
min= 0.17394069818649
max= 0.61486148644254
median= 0.37594955411701
mean= 0.37832337311441
std= 0.07127508704293
ppl= 1.4598349379268
val loss= 7.2912249430419
val ppl= 1467.3670378147
The error on training kept on going down with each epoch.
Something definitely happened in this branch or recently on master that decreased the (subjective) quality of the responses in eval.th
.
It might be in the recent changes I pushed on master, I'm looking into it...