kaldi-gstreamer-server icon indicating copy to clipboard operation
kaldi-gstreamer-server copied to clipboard

Only works when model is trained using 16khz audio data

Open alx741 opened this issue 6 years ago • 4 comments

Apparently, when the model is trained using audio data with a sample rate other than 16kHz, the decoder fails at decoding audio at any sample rate, even when tweaking the corresponding sample rate parameters on the request to the server (or in the client arguments for that matter).

This was the issue I was having in #186: My model was originally trained with 44.1khz audio data (with a matching MFCC config --sample-frequency=44100 of course). When I converted all my data to 16khz and re-trained the model, it worked perfectly.

NOTE: This problem is likely to be on Kaldi's decoder rather than kaldi-gstream-server, but this is where I first encounter it so I'm putting it here to promote further investigation.

alx741 avatar Apr 20 '19 19:04 alx741

Just curious: how does the performance (WER) differ between 44.1 kHz and 16 kHz?

svenha avatar Apr 20 '19 19:04 svenha

@svenha It actually improved, it dropped from WER=~12% (44.1khz) to WER=~8% (16khz)

alx741 avatar Apr 20 '19 19:04 alx741

So, 16 kHz is better? This would fit with other reports.

svenha avatar Apr 20 '19 20:04 svenha

So, 16 kHz is better? This would fit with other reports.

Yes, 16khz seems to be better

alx741 avatar Apr 20 '19 20:04 alx741