diart icon indicating copy to clipboard operation
diart copied to clipboard

Support for Silero VAD

Open tjainsuki opened this issue 9 months ago • 6 comments

Hi Developers,

Thank you for your amazing work on this project!

I was wondering if there’s a way to use Silero VAD. I noticed that PyAnnote VAD is supported, but Silero VAD isn’t. Have you tried integrating Silero VAD, and if so, how does its accuracy, or latency compare?

I also tried adding Silero VAD with custom parameters, but unfortunately, I couldn’t get it to work. Any guidance or suggestions would be greatly appreciated!

tjainsuki avatar Feb 13 '25 03:02 tjainsuki

Hi @tjainsuki! How were you thinking of integrating Silero? As an alternative VAD pipeline? I haven't tried integrating it but if you have an idea I would be glad to work on a PR with you to get it to work

juanmc2005 avatar Feb 13 '25 15:02 juanmc2005

hi @juanmc2005

suggesting to have Silero VAD as an alternative to Pyannote VAD.

I’ve written a basic script to implement this functionality, but I’m not very familiar with the internal workings of the diart library. My script has a bug in processing & returning the correct parameters. Would you be able to help me fix this issue?

Attached is the py file with txt as an extension

Thanks!

sd_silero.txt

csetanmayjain avatar Feb 13 '25 16:02 csetanmayjain

The thing is Silero only makes sense in a VoiceActivityDetection pipeline. It would be a segmentation model that we won't be able to use in replacement of any SegmentationModel. In this case I think we could probably introduce a new type of model, e.g. VADModel. That way one of the VADModel implementations could leverage pyannote and the other could rely on silero. However, you wouldn't be able to use a VADModel in a SpeakerDiarization pipeline (as expected).

We should also think about how to design the interface of VADModel and how to change the inner workings of VoiceActivityDetection so that both types of model are compatible, because they work in very different ways.

juanmc2005 avatar Feb 13 '25 16:02 juanmc2005

That makes sense. I'll try to modify the VAD to support Silero when I have the bandwidth.

I'll keep you updated!

csetanmayjain avatar Feb 13 '25 17:02 csetanmayjain

@juanmc2005 Could we please include this feature in the new PR?

sprath9 avatar Feb 21 '25 17:02 sprath9

@sprath9 I'd be happy to review and merge a PR supporting Silero VAD. @csetanmayjain feel free to open a draft PR. I can always guide you through the implementation to get it merged asap.

juanmc2005 avatar Feb 25 '25 09:02 juanmc2005