distil-whisper icon indicating copy to clipboard operation
distil-whisper copied to clipboard

Long-Form transcription with Faster Whisper

Open 9throok opened this issue 1 year ago • 3 comments

Hi, I have been working on faster whisper and trying to use the distil-whisper model. However, distil-whisper supports 30s of audio chunks and using it with faster whisper only outputs the first 30 seconds.

How can it be used with the faster-whisper implementation?

9throok avatar Nov 13 '23 05:11 9throok

Hey @9throok - cool to see that you're using Distil-Whisper in combination with Faster-Whisper! I believe the .transcribe method in Faster-Whisper handles the long-form generation algorithm: https://github.com/guillaumekln/faster-whisper#usage Is this the API that you've been using? If you could share a reproducible code snippet that showcases the behaviour you're seeing that would be great, thanks!

sanchit-gandhi avatar Nov 13 '23 14:11 sanchit-gandhi

@9throok, any update on the issue that you mentioned?

murdadesmaeeli avatar Dec 29 '23 20:12 murdadesmaeeli

Hi, I have been working on faster whisper and trying to use the distil-whisper model. However, distil-whisper supports 30s of audio chunks and using it with faster whisper only outputs the first 30 seconds.

I had same issue, after the first chunk nada in output, then looked at debug - distill model just hallucinated non stop after the first chunk, solution is to disable context prompt, initial prompt has negative effect too.

How can it be used with the faster-whisper implementation?

Now it has official support -> https://github.com/SYSTRAN/faster-whisper/commit/ad3c83045bc0748b744e064ddfda680c86662e7e

Or you can use the standalone executable -> https://github.com/Purfview/whisper-standalone-win

Purfview avatar Jan 26 '24 15:01 Purfview