Pytorch-Speech-Recognition icon indicating copy to clipboard operation
Pytorch-Speech-Recognition copied to clipboard

anyone get a good performance?

Open qute012 opened this issue 3 years ago • 1 comments

Hi. I ran this project for korean speech recognition. But loss is not decreasing and i don't get good predictions. I've already used preprocessing method that works well on DeepSpeech and LAS.

It's seems like to DeepSpeech architectures. but not, in the paper use hmm pre-builded on kaldi processing and lf-mmi instead of CTC.

https://www.danielpovey.com/files/2020_interspeech_multistream.pdf

above, like this project reference, using single stream before multi stream. who knows problems or gets good performance using this project?

qute012 avatar Aug 02 '20 12:08 qute012

I tried with LibriSpeech train-clean-100 dataset but WER didn't improved at all. My WER was around 0.95.

I changed 2 things.

  1. changed stride from strides=[5,2,1] to strides=[2,2,1] to avoid assertion error.
  2. change sample rate from 48000 to 16000.

I don't know what I need to change...

kouohhashi avatar Nov 08 '20 07:11 kouohhashi