kaldi-gstreamer-server Only works when model is trained using 16khz audio data

Only works when model is trained using 16khz audio data

Open alx741 opened this issue 6 years ago • 4 comments

Apparently, when the model is trained using audio data with a sample rate other than 16kHz, the decoder fails at decoding audio at any sample rate, even when tweaking the corresponding sample rate parameters on the request to the server (or in the client arguments for that matter).

This was the issue I was having in #186: My model was originally trained with 44.1khz audio data (with a matching MFCC config --sample-frequency=44100 of course). When I converted all my data to 16khz and re-trained the model, it worked perfectly.

NOTE: This problem is likely to be on Kaldi's decoder rather than kaldi-gstream-server, but this is where I first encounter it so I'm putting it here to promote further investigation.

Apr 20 '19 19:04 alx741

Just curious: how does the performance (WER) differ between 44.1 kHz and 16 kHz?

Apr 20 '19 19:04 svenha

@svenha It actually improved, it dropped from WER=~12% (44.1khz) to WER=~8% (16khz)

Apr 20 '19 19:04 alx741

So, 16 kHz is better? This would fit with other reports.

Apr 20 '19 20:04 svenha

So, 16 kHz is better? This would fit with other reports.

Yes, 16khz seems to be better

Apr 20 '19 20:04 alx741

kaldi-gstreamer-server kaldi-gstreamer-server copied to clipboard

Only works when model is trained using 16khz audio data

kaldi-gstreamer-server
kaldi-gstreamer-server copied to clipboard