diart
diart copied to clipboard
Periodic Silent Speaker Detection
First of all, thanks for the project! I’m using it in a live Speech-to-Text (STT) + diarization setup: https://github.com/QuentinFuxa/whisper_streaming_web
I am testing my pipeline using MacOS BlackHole, which routes the output audio to the input. This allows me to repeatedly test TTS and diarization with the same exact audio. When I start the system, virtually no sound is being produced. However, when running a simple script with diart, I observe speaker 0 being detected periodically, even though no meaningful audio is present.
from diart.sources import MicrophoneAudioSource
from diart.inference import StreamingInference
from diart.sinks import RTTMWriter
pipeline = SpeakerDiarization()
mic = MicrophoneAudioSource()
inference = StreamingInference(pipeline, mic, do_plot=True)
prediction = inference()
Speaker 0 is detected at regular intervals, as shown in the attached image. It seems like the system is detecting artificial periodic bursts.
Would appreciate any insights on what might be causing this issue!
Hey @QuentinFuxa, maybe your audio contains some artifacts? Have you tried to listen to it?
I haven't seen this behavior before. Does it happen with all the supported models?
Hi @juanmc2005 Yes it happens with all the models. But it only seems to happen when their with virtually 0 noise, which does not happens in real life if the lib listens to a mic or another audio source, so that's not an important issue for real use cases!