diart icon indicating copy to clipboard operation
diart copied to clipboard

Periodic Silent Speaker Detection

Open QuentinFuxa opened this issue 9 months ago • 1 comments

First of all, thanks for the project! I’m using it in a live Speech-to-Text (STT) + diarization setup: https://github.com/QuentinFuxa/whisper_streaming_web

I am testing my pipeline using MacOS BlackHole, which routes the output audio to the input. This allows me to repeatedly test TTS and diarization with the same exact audio. When I start the system, virtually no sound is being produced. However, when running a simple script with diart, I observe speaker 0 being detected periodically, even though no meaningful audio is present.

from diart.sources import MicrophoneAudioSource
from diart.inference import StreamingInference
from diart.sinks import RTTMWriter

pipeline = SpeakerDiarization()
mic = MicrophoneAudioSource()
inference = StreamingInference(pipeline, mic, do_plot=True)
prediction = inference()

Speaker 0 is detected at regular intervals, as shown in the attached image. It seems like the system is detecting artificial periodic bursts.

Image

Would appreciate any insights on what might be causing this issue!

QuentinFuxa avatar Mar 03 '25 16:03 QuentinFuxa

Hey @QuentinFuxa, maybe your audio contains some artifacts? Have you tried to listen to it?

I haven't seen this behavior before. Does it happen with all the supported models?

juanmc2005 avatar May 30 '25 08:05 juanmc2005

Hi @juanmc2005 Yes it happens with all the models. But it only seems to happen when their with virtually 0 noise, which does not happens in real life if the lib listens to a mic or another audio source, so that's not an important issue for real use cases!

QuentinFuxa avatar Jun 29 '25 08:06 QuentinFuxa