sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

VAD segment length cap at around 20s

Open chiiyeh opened this issue 1 year ago • 1 comments

Hi, was playing around with the VAD model and realized that the maximum speech duration is kept to around 20s regardless of the buffer size. Took a look at the code and saw that it is hardcoded in this line:

https://github.com/k2-fsa/sherpa-onnx/blob/de04b3b9bfc6d48a8ac340e00083d9fd5411b81e/sherpa-onnx/csrc/voice-activity-detector.cc#L156C7-L156C29

Would be nice if this can be a parameter that can be modified. My instinct is that the buffer sort of control the maximum duration, but that turns out to be wrong. Not sure if this is the default behaviour for the original silero vad as well.

chiiyeh avatar Jul 16 '24 00:07 chiiyeh

Not sure if this is the default behaviour for the original silero vad as well.

It is not the default behavior of silero vad.

We add such a constraint since many users complain that the vad gives them a very long segment.

Typically, you won't get a segment more than 20 seconds if there are longer pauses in your audio.


Would be nice if this can be a parameter that can be modified.

We accept PRs to change that. Would you like to contribute?

csukuangfj avatar Jul 16 '24 02:07 csukuangfj