Diarization Efficacy
Hi love the work you are doing and especially the diarization functionality. I am using your most current PR's. I have been trying to use Diarization on just a simple one-to-one conversation between quite different voices however speaker diarization is a bit choppy with diart grouping some dialogue under one speaker when it should be multiple (15min conversation).
What would be a way to optimise the diarizer? What test audio do you use? And how do you measure efficacy of the diarization process? DBSCAN or other?
Furthermore, is there a way to collect all diart centroids and compare them to a known centroid (the speaker is known)? in real-time?
Hi !
What test audio do you use ?
I use an AI audio generated using vidnoz with different speakers, and blackhole to redirect output to input audio from my computer. Which means that
- The audio is really clear because it does not come from a mic
- The voices are quite different (male and female too)
What would be a way to optimise the diarizer?
There is a solution provided by diart: What would be a way to optimise the diarizer? The more evident solution I would say would be to extend the size of the 5 second batchs, but that would have a computational cost, and increase latency, as mentionned in https://github.com/juanmc2005/diart/blob/main/paper.pdf
Furthermore, is there a way to collect all diart centroids and compare them to a known centroid (the speaker is known)? in real-time?
Maybe, there are a lot of visualisation options in diart, but for some reasons I cannot have the graphs working with my implementation of the source and the hook, so I cannot tell you how to display that yet. I am interested and if I have time I will try to debug that