multiffn-nli icon indicating copy to clipboard operation
multiffn-nli copied to clipboard

optimization problem

Open nguyenkh opened this issue 8 years ago • 6 comments

Hi,

There're something wrong. The validation accuracy was not improved after 5 epochs.

Can you check it out?

5 completed epochs, 96500 batches Average training batch loss: 1.091183 Validation loss: 1.105025 Validation accuracy: 0.338244 5 completed epochs, 96600 batches Average training batch loss: 1.090493 Validation loss: 1.104703 Validation accuracy: 0.338244 5 completed epochs, 96700 batches Average training batch loss: 1.087554 Validation loss: 1.105083 Validation accuracy: 0.338244 5 completed epochs, 96800 batches Average training batch loss: 1.087669 Validation loss: 1.105098 Validation accuracy: 0.338244 5 completed epochs, 96900 batches Average training batch loss: 1.092170 Validation loss: 1.105152 Validation accuracy: 0.338244 5 completed epochs, 97000 batches Average training batch loss: 1.090474 Validation loss: 1.105317 Validation accuracy: 0.338244 5 completed epochs, 97100 batches Average training batch loss: 1.091573 Validation loss: 1.105531 Validation accuracy: 0.338244 5 completed epochs, 97200 batches Average training batch loss: 1.092324 Validation loss: 1.104889 Validation accuracy: 0.338244 5 completed epochs, 97300 batches Average training batch loss: 1.092526 Validation loss: 1.104734 Validation accuracy: 0.338244 5 completed epochs, 97400 batches Average training batch loss: 1.090160 Validation loss: 1.104476 Validation accuracy: 0.338244

nguyenkh avatar Feb 09 '17 18:02 nguyenkh

This is likely a problem with your hyperparameters, since even the training loss isn't improving. I would guess you are using a very high learning rate and/or maybe a high dropout. Is this the SNLI dataset?

erickrf avatar Feb 09 '17 18:02 erickrf

I used lrate == 0.05 and dropout == 0.2. Dose this make sense? I am training with SNLI dataset.

nguyenkh avatar Feb 09 '17 18:02 nguyenkh

Ah, see that the -d parameter actually means the dropout keep probability, so 0.2 means you are zeroing 80% of the inputs. You probably meant to use -d 0.8.

It works that way because I decided to do the same as the tensorflow interface for dropout, but I agree it can be confusing.

erickrf avatar Feb 09 '17 19:02 erickrf

It's still there even with d=0.8 or 0.2. The validation accuracy was not over 0.338244.

nguyenkh avatar Feb 09 '17 19:02 nguyenkh

This is strange. I consistently get good results with the configuration mentioned in the readme.

erickrf avatar Feb 11 '17 10:02 erickrf

hi, nguyenkh,erickrf , I met the same problem , but i found my vocab file is wrong style(word,\t ,index ) , I added id to vocab , this error would not abort the program, but this cause word in train can't find the vector, i change the word style to (word '\n') ,it works

whwdreamsky avatar Aug 17 '18 09:08 whwdreamsky