end2end-asr-pytorch icon indicating copy to clipboard operation
end2end-asr-pytorch copied to clipboard

A problem about LibriSpeech's testing results

Open ssteven502tw opened this issue 4 years ago • 1 comments

I have some question for you.

Whether the low-rank transformer model is not good for longer english sentence recognition (more than 30 words), I found that the WER is high, and the testing result is shown in the following:


Epoch 75 ,"Test_clean, WER=15.98%, CER=9.79%" ,"Test_other, WER=31.55%, CER=17.71%"

For example: hyp = "as the chase drives away mary stands bewildered and perplexed on the doorstep her mind in a tumult of excitement in which hatred of the doctor distrust and suspicion of her"

gold = "as the chaise drives away mary stands bewildered and perplexed on the door step her mind in a tumult of excitement in which hatred of the doctor distrust and suspicion of her mother disappointment vexation and ill humor surge and swell among those delicate organizations on which the structure and development of the soul so closely depend doing perhaps an irreparable injury"


Later sequences are not recognized, is there any way to improve it?

Thanks

ssteven502tw avatar Feb 25 '20 14:02 ssteven502tw

Hi @ssteven502tw

There are several possible ways to improve the performance. First, you should check whether you cut the audio during the preprocessing. Since the sequence is long, probably you accidentally limit the audio of the training set. Second, you should also check the maximum sequence length param in the training.

gentaiscool avatar Feb 29 '20 00:02 gentaiscool