whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Regression in accuracy

Open meakbiyik opened this issue 2 years ago • 0 comments

I wanted to create a separate issue for the problems I described in #354. Since 385236d1d3d7a0228f5279657938ae5f1313ca94, I have seen severe regression in WER for noisy audio, at around 10-20%. I am attaching a noisy German audio that I can reproduce this with.

https://user-images.githubusercontent.com/23424198/212912717-27d4a6fa-3f34-4113-9877-dd355555fefe.mp4

The command I use for both the master branch and above tag are

./bin/stream -l de -m ./gglm-small.bin -kc -ac 512 -t 4 --step 1500 --length 10000

The expected transcription is:

Wir wollen mehr Demokratie wagen. Wir werden unsere Arbeitsweise öffnen und dem kritischen Bedürfnis nach Information Genüge tun.

As noted in the previous issue, I suspect that the main problem is not the temperature or keep-context. My bet would be on either the loss of precision from 32-16 bit conversions, or some bug related to them, since this can directly cause issues with noise robustness (and possibly the overall quality of tiny models) without creating a problem for high-SNR data and bigger models.

meakbiyik avatar Jan 17 '23 13:01 meakbiyik