WhisperLiveKit icon indicating copy to clipboard operation
WhisperLiveKit copied to clipboard

Can I get current VAD status real time back on FE?

Open AndrewKirkovski opened this issue 5 months ago • 1 comments

I'm writing a conversation agent and need to have VAD status online to understand if user is trying to interrupt current TTS answer or not

I've tried using https://github.com/ricky0123/vad for this to detect VAD in browser and to send packets to whisper only when user is talking, i.e. like this

recorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
recorder.ondataavailable = (e) => {
    if (websocket && websocket.readyState === WebSocket.OPEN && isSpeaking) {
        console.log(e.data, e.data.length)
        websocket.send(e.data);
    } else {
        console.log('Skipped', e.data);
    }
};

where isSpeaking is VAD status from https://github.com/ricky0123/vad

However like this I'm facing FFMpeg issues, seems whisperlivekit relies on stream being non-interrupted?

Image

So it would be nice to have VAD status reported by API back in real time Also I was wondering if it's possible to split transcription on silence?

I.e. if I say "Hello, how are you? <not speaking 5 seconds> How's the weather today?" currently it produces one line "Hello, how are you? How's the weather today?"

Could it be improved by some flag to be separated in 2 lines? "Hello, how are you?" "How's the weather today?" with different start times?

Maybe you could advice where to look into code for that

AndrewKirkovski avatar Jul 07 '25 17:07 AndrewKirkovski

Hi, last commit solves that. You can pull directly from github, or wait for the next 0.2.5 release

Image

QuentinFuxa avatar Aug 11 '25 15:08 QuentinFuxa