Whisper-WebUI Silero VAD

First of all, thanks for this project, it's very easy to set up and run locally.

Transcribing on this webui, the large-v2 model skips the first three sentences in a file I tested, just like what happens over here with the Silero VAD turned off : https://huggingface.co/spaces/aadnk/faster-whisper-webui

I guess the VAD is included here (silero_vad.onnx). Is it on by default? Are there any settings I could tweak?

Oct 03 '23 00:10 Trevor-Z

Hi @Trevor-Z ! According to faster-whisper, the vad filter (Silero VAD) is turned off by default. So it's turned off when you just transcribed in this webui. I may have to add the vad filter options in the Advanced Paramters.

For now, if Whisper doesn't transcribe the first few sentences, it may mean that Whisper recognized them as a "silent" part of the audio.

You can adjust the log_prob_threshold and no_speech_threshold values in the Advanced Parameters tab to adjust how Whisper handles a silent part. You can see how to use these parameters in the wiki.

Oct 03 '23 06:10 jhj0517

What's the valid range of values for log_prob_threshold and no_speech_threshold?

Also, is there some way to turn the vad on now? Like changing a parameter in some .py file?

Oct 03 '23 11:10 Trevor-Z

VAD filter is added on the WebUI.

You can tune parameters there. Please feel free to re-open about this!

Jul 12 '24 08:07 jhj0517

Whisper-WebUI Whisper-WebUI copied to clipboard

Silero VAD

Whisper-WebUI
Whisper-WebUI copied to clipboard