whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Zero-pad the audio, not the spectrogram

Open ggerganov opened this issue 1 year ago • 3 comments

Makes sense - hopefully will reduce hallucinations

https://github.com/openai/whisper/discussions/838#discussioncomment-5222689

ggerganov avatar Mar 07 '23 04:03 ggerganov

(Removed previous incorrect assumption about the featurization).

As long as you don't change the featurization, it looks like you can just switch your padding to -1.5 if it's more convenient to keep padding at the spectrogram level.

It seems like your speed_up could interfere with the padding otherwise

lunixbochs avatar Mar 07 '23 21:03 lunixbochs

Oh this might be a game-changer, I wasn't able to find a way to reduce them so far.

meakbiyik avatar Mar 13 '23 19:03 meakbiyik

This was merged into Whisper as of 20230307, is there a chance we'll see it in whisper.cpp soon?

albino1 avatar Mar 23 '23 02:03 albino1