lhotse VAD workflow with Silero

          In terms of which VAD to apply, you can use e.g. SileroVAD: https://github.com/snakers4/silero-vad/wiki/Examples-and-Dependencies#examples

Actually a workflow/integration into Lhotse would be nice if somebody is willing to contribute that.

Originally posted by @pzelasko in https://github.com/lhotse-speech/lhotse/issues/726#issuecomment-1522540788

Apr 25 '23 23:04 desh2608

FYI: We have just integrated silero VAD into sherpa-onnx. All you need is to run

pip install sherpa-onnx

You can find two Python examples below:

https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/vad-remove-non-speech-segments.py#L104
https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/generate-subtitles.py#L348

We also have a huggingface space that uses Silero VAD + non-streaming ASR models to generate subtitles for videos/audios. Please see

The code related to VAD for the above huggingface space can be found at https://huggingface.co/spaces/k2-fsa/generate-subtitles-for-videos/blob/main/decode.py

Sep 20 '23 12:09 csukuangfj

Cool! Maybe it would be interesting to create lhotse workflows that leverage sherpa (e.g. at the start they launch server subprocess and then spawn N clients to process data with sherpa).

Sep 21 '23 14:09 pzelasko