wav2letter Empty predictions/100 WER after training conv

I have been trying to train conv_glu on a custom dataset. The training process starts and continues on without errors. However, the WER/TER stays at 100 for a large number of epochs while the loss decreases. When I use the saved acoustic model with the Test binary, the predictions are empty which essentially means that the error rate was due to all the characters being deleted! To distill this problem to a particular source, I tried using a single example with the alphabet file modified accordingly. The problem still persists! I am sure I'm missing something quite simple, can anyone point me in the right direction?

Attachments

Token file for single toy example: alphabet.txt
File and transcription list (same list for Train, Dev and Test): dummy.txt
Lexicon file (Created using the whole corpus, but contains the required words): lexicon (1).txt
LM (Same as the lexicon file, created using the whole corpus): https://drive.google.com/file/d/1qsFyA3DpoHr9F39zIuc5JEsZRfT71eZS/view?usp=sharing
Test Log (Contains params and empty transcription): test_log.txt

Sep 30 '20 17:09 nikhilnagaraj

could you try to run with --showletters=true and put here the log? Could you post also your training log/config?

Oct 01 '20 05:10 tlikhomanenko

@tlikhomanenko Similar results with --showletters=true! Log file : test_log (1).txt. On a side note, isn't --show=true the same thing?

Training Log: 001_log.txt
Training Config: 001_config.txt

Oct 01 '20 05:10 nikhilnagaraj

Ohh, you run test.cpp not decode.cpp. For decode.cpp "show" shows words transcriptions while "showletters" show the tokens transcription. For test.cpp we are printing only words transcription always so showletters is ignored.

Ok, could you run you test.cpp with --uselexicon=true? Could you also post the file wav2letter/recipes/models/conv_glu/librispeech/train_ssnl.cfg? So far your config looks fine, probably the problem with training itself, your loss going down, and then stuck, one thing I would try is to stop process after 3 epochs and try to run test.cpp to be sure that this state with empty output appears only after some training.

Oct 04 '20 05:10 tlikhomanenko

@tlikhomanenko Train_ssnl.cfg : train_ssnl.txt

Same results with --uselexicon=true and after stopping the training process after 3 epochs. Just to clarify, is there a difference between the test and decode options when passed to Test.cpp?

Oct 08 '20 08:10 nikhilnagaraj

Hi, It looks like you are not using --linseg= flag for training. You might want to use --linseg=10000 to make sure WER goes below 100. Tuning --lr and --lrcrit might also help.

(In older version of wavl2etter, you should use linseg=1 instead as we used to count in epochs before.)

Oct 21 '20 06:10 vineelpratap

@vineelpratap here is ctc criterion, not asg in the config.

Oct 22 '20 06:10 tlikhomanenko

Oops, nevermind then !

Oct 22 '20 20:10 vineelpratap

Yeah, it's CTC. Any idea why this is happening though?

On Thu, 22 Oct, 2020, 10:22 PM Vineel Pratap, [email protected] wrote:

Oops, nevermind then !

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/wav2letter/issues/846#issuecomment-714739390, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXNCWCUKUWOYTRSVNDOYMLSMCIAVANCNFSM4R7ME63Q .

Oct 22 '20 20:10 nikhilnagaraj

First, is this audio really 285476ms ~= 5min with this short transcription? because then it makes totally sense for the model training on one this sample to predict all blanks/silence, could you confirm that this audio duration and short transcription are correct?

Oct 23 '20 03:10 tlikhomanenko

That's the size of the audio file, not the duration. The duration is much smaller at about 4s. But is it required that the number be the time in milliseconds? The documentation says it is just a real number used to sort data (which can be audio duration)!

From Data Preparation size - a real number used for sorting the dataset (typically audio duration in milliseconds).

Oct 23 '20 07:10 nikhilnagaraj

yep, correct, you can use any number. Just was wondering if the problem with the very long input training.

No idea for now why it doesn't work. First I would try to take one librispeech sample and try to train on it. If you will have the same problem, just post here the sample and all your config files I will try to run the same thing and debug what is not working.

Oct 27 '20 03:10 tlikhomanenko

wav2letter
wav2letter copied to clipboard

Empty predictions/100 WER after training conv_glu on a different language

wav2letter wav2letter copied to clipboard

Empty predictions/100 WER after training conv_glu on a different language

wav2letter
wav2letter copied to clipboard