icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Very high WER when I tried to use bpe_500 lexicon on tdnn_lstm_ctc model

Open wangtiance opened this issue 3 years ago • 2 comments

Yes, I know tdnn-lstm is kinda outdated but it's easier to fit it to our edge device, so I thought it might be worth trying. When I trained it the loss soon stopped decreasing, and decode.py produced WER close to 100, with hyps almost always empty. My first guess was that 500 labels are too many for tdnn-lstm, and generated a bpe model with 100 tokens. This time it is able to train but WER is still higher than using phone lexicon. I'm not sure if there's something wrong with my code, or bpe_500 just doesn't work for tdnn-lstm.

The code that I used to train and decode is in my repo: https://github.com/wangtiance/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc

log-train-2022-07-25-17-04-51-0.txt recogs-test-clean-no_rescore.txt

wangtiance avatar Aug 02 '22 09:08 wangtiance

I think there are some problems in your training codes. Your hyps are empty. You can print the ids of the output. Look at what they are. I suspect that your text labels (or text ids) are wrong in your training. You can print the y after y = sp.encode(texts, out_type=int) in function compute_loss of train.py .

luomingshuang avatar Aug 10 '22 07:08 luomingshuang

I think there are some problems in your training codes. Your hyps are empty. You can print the ids of the output. Look at what they are. I suspect that your text labels (or text ids) are wrong in your training. You can print the y after y = sp.encode(texts, out_type=int) in function compute_loss of train.py .

Thanks for the suggestion. Will look into that.

wangtiance avatar Aug 10 '22 08:08 wangtiance