optimization problem
Hi,
There're something wrong. The validation accuracy was not improved after 5 epochs.
Can you check it out?
5 completed epochs, 96500 batches Average training batch loss: 1.091183 Validation loss: 1.105025 Validation accuracy: 0.338244 5 completed epochs, 96600 batches Average training batch loss: 1.090493 Validation loss: 1.104703 Validation accuracy: 0.338244 5 completed epochs, 96700 batches Average training batch loss: 1.087554 Validation loss: 1.105083 Validation accuracy: 0.338244 5 completed epochs, 96800 batches Average training batch loss: 1.087669 Validation loss: 1.105098 Validation accuracy: 0.338244 5 completed epochs, 96900 batches Average training batch loss: 1.092170 Validation loss: 1.105152 Validation accuracy: 0.338244 5 completed epochs, 97000 batches Average training batch loss: 1.090474 Validation loss: 1.105317 Validation accuracy: 0.338244 5 completed epochs, 97100 batches Average training batch loss: 1.091573 Validation loss: 1.105531 Validation accuracy: 0.338244 5 completed epochs, 97200 batches Average training batch loss: 1.092324 Validation loss: 1.104889 Validation accuracy: 0.338244 5 completed epochs, 97300 batches Average training batch loss: 1.092526 Validation loss: 1.104734 Validation accuracy: 0.338244 5 completed epochs, 97400 batches Average training batch loss: 1.090160 Validation loss: 1.104476 Validation accuracy: 0.338244
This is likely a problem with your hyperparameters, since even the training loss isn't improving. I would guess you are using a very high learning rate and/or maybe a high dropout. Is this the SNLI dataset?
I used lrate == 0.05 and dropout == 0.2. Dose this make sense? I am training with SNLI dataset.
Ah, see that the -d parameter actually means the dropout keep probability, so 0.2 means you are zeroing 80% of the inputs. You probably meant to use -d 0.8.
It works that way because I decided to do the same as the tensorflow interface for dropout, but I agree it can be confusing.
It's still there even with d=0.8 or 0.2. The validation accuracy was not over 0.338244.
This is strange. I consistently get good results with the configuration mentioned in the readme.
hi, nguyenkh,erickrf , I met the same problem , but i found my vocab file is wrong style(word,\t ,index ) , I added id to vocab , this error would not abort the program, but this cause word in train can't find the vector, i change the word style to (word '\n') ,it works