deepspeech.torch icon indicating copy to clipboard operation
deepspeech.torch copied to clipboard

I get the same training error by every epoch

Open byuns9334 opened this issue 8 years ago • 2 comments

I'm running " th Train.lua -epochSave -learningRateAnnealing 1.1 -trainingSetLMDBPath prepare_datasets/libri_lmdb/train/ -validationSetLMDBPath prepare_datasets/libri_lmdb/test/ -LSTM -hiddenSize 500 -permuteBatch " on librispeech dataset, but I still get the same training error on every epoch, while the loss continuously gets decreased.

Here's what I get:

Number of parameters: 31576697 [==================== 136/136 ================>] Tot: 1m13s | Step: 646ms Training Epoch: 1 Average Loss: nan Average Validation WER: 100.09 Average Validation CER: 62.14 Saving model.. [==================== 136/136 ================>] Tot: 1m18s | Step: 566ms Training Epoch: 2 Average Loss: 7047724391721807312664730917666816.000000 Average Validation WER: 100.05 Average Validation CER: 61.98 Saving model.. [==================== 136/136 ================>] Tot: 1m18s | Step: 588ms Training Epoch: 3 Average Loss: 3568794773768703940829837988462592.000000 Average Validation WER: 100.05 Average Validation CER: 62.00 Saving model.. [==================== 136/136 ================>] Tot: 1m19s | Step: 555ms Training Epoch: 4 Average Loss: nan Average Validation WER: 100.05 Average Validation CER: 62.03 Saving model..

How should I resolve this?

byuns9334 avatar Oct 29 '17 12:10 byuns9334

Something definitely looks wrong with the loss... could you run the tests in the warp-ctc repo for torch and make sure the values are not 0s or infs?

SeanNaren avatar Oct 29 '17 16:10 SeanNaren

@SeanNaren Which command should I write to run the tests in warp-ctc repository?

byuns9334 avatar Oct 30 '17 07:10 byuns9334