Diarization Efficacy

Open simondpalmer opened this issue 9 months ago • 1 comments

Hi love the work you are doing and especially the diarization functionality. I am using your most current PR's. I have been trying to use Diarization on just a simple one-to-one conversation between quite different voices however speaker diarization is a bit choppy with diart grouping some dialogue under one speaker when it should be multiple (15min conversation).

What would be a way to optimise the diarizer? What test audio do you use? And how do you measure efficacy of the diarization process? DBSCAN or other?

Furthermore, is there a way to collect all diart centroids and compare them to a known centroid (the speaker is known)? in real-time?

Feb 24 '25 18:02 simondpalmer

Hi !

What test audio do you use ?

I use an AI audio generated using vidnoz with different speakers, and blackhole to redirect output to input audio from my computer. Which means that

The audio is really clear because it does not come from a mic
The voices are quite different (male and female too)

What would be a way to optimise the diarizer?

There is a solution provided by diart: What would be a way to optimise the diarizer? The more evident solution I would say would be to extend the size of the 5 second batchs, but that would have a computational cost, and increase latency, as mentionned in https://github.com/juanmc2005/diart/blob/main/paper.pdf

Furthermore, is there a way to collect all diart centroids and compare them to a known centroid (the speaker is known)? in real-time?

Maybe, there are a lot of visualisation options in diart, but for some reasons I cannot have the graphs working with my implementation of the source and the hook, so I cannot tell you how to display that yet. I am interested and if I have time I will try to debug that

Mar 01 '25 11:03 QuentinFuxa