Whisper-WebUI
Whisper-WebUI copied to clipboard
Silero VAD
First of all, thanks for this project, it's very easy to set up and run locally.
Transcribing on this webui, the large-v2 model skips the first three sentences in a file I tested, just like what happens over here with the Silero VAD turned off : https://huggingface.co/spaces/aadnk/faster-whisper-webui
I guess the VAD is included here (silero_vad.onnx). Is it on by default? Are there any settings I could tweak?
Hi @Trevor-Z ! According to faster-whisper, the vad filter (Silero VAD) is turned off by default. So it's turned off when you just transcribed in this webui. I may have to add the vad filter options in the Advanced Paramters.
For now, if Whisper doesn't transcribe the first few sentences, it may mean that Whisper recognized them as a "silent" part of the audio.
You can adjust the log_prob_threshold and no_speech_threshold values in the Advanced Parameters tab to adjust how Whisper handles a silent part.
You can see how to use these parameters in the wiki.
What's the valid range of values for log_prob_threshold and no_speech_threshold?
Also, is there some way to turn the vad on now? Like changing a parameter in some .py file?
VAD filter is added on the WebUI.
You can tune parameters there. Please feel free to re-open about this!