WhisperLiveKit
WhisperLiveKit copied to clipboard
Can I get current VAD status real time back on FE?
I'm writing a conversation agent and need to have VAD status online to understand if user is trying to interrupt current TTS answer or not
I've tried using https://github.com/ricky0123/vad for this to detect VAD in browser and to send packets to whisper only when user is talking, i.e. like this
recorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
recorder.ondataavailable = (e) => {
if (websocket && websocket.readyState === WebSocket.OPEN && isSpeaking) {
console.log(e.data, e.data.length)
websocket.send(e.data);
} else {
console.log('Skipped', e.data);
}
};
where isSpeaking is VAD status from https://github.com/ricky0123/vad
However like this I'm facing FFMpeg issues, seems whisperlivekit relies on stream being non-interrupted?
So it would be nice to have VAD status reported by API back in real time Also I was wondering if it's possible to split transcription on silence?
I.e. if I say "Hello, how are you? <not speaking 5 seconds> How's the weather today?" currently it produces one line "Hello, how are you? How's the weather today?"
Could it be improved by some flag to be separated in 2 lines? "Hello, how are you?" "How's the weather today?" with different start times?
Maybe you could advice where to look into code for that
Hi, last commit solves that. You can pull directly from github, or wait for the next 0.2.5 release