WhisperLiveKit Diarization quality

Diarization quality

Open jwillmer opened this issue 5 months ago • 1 comments

I tested the service now and I noticed the the diarization is pretty bad. It only works when I use the --backend whisper_timestamped and it often splits up one speaker into multiple. Is this the current state of the art / expected by the model or am I doing something wrong?

Jun 27 '25 13:06 jwillmer

Hi, Yes, open-source live diarization solutions often struggle, especially at the start of conversations. You can try using Diart directly with:

from diart import SpeakerDiarization
from diart.sources import MicrophoneAudioSource
from diart.inference import StreamingInference

pipeline = SpeakerDiarization()
mic = MicrophoneAudioSource()
inference = StreamingInference(pipeline, mic, do_plot=True)
prediction = inference()

This will let you watch live speaker identification, and test different models available here: https://github.com/juanmc2005/diart?tab=readme-ov-file#-models

If you notice any difference in results between using Diart directly and through WhisperLiveKit, please let me know!

Jun 30 '25 13:06 QuentinFuxa

WhisperLiveKit WhisperLiveKit copied to clipboard

Diarization quality

WhisperLiveKit
WhisperLiveKit copied to clipboard