echogarden
echogarden copied to clipboard
Recognition: real-time, streaming Whisper recognition
Tokens are already decoded and displayed live during Whisper decoding, at least on the CLI.
Getting Whisper to recognize in real-time (or at least near real-time) is possible. However:
- It's really important for me to get a low, usable latency. Preferably something that can be responsive enough for a real-time voice chat with a language model (along with low-latency synthesis, which is already mostly ready).
- That would require some planning and code reorganization to get right.
- Need to integrate an effective VAD (voice activity detection) strategy to cut the audio at the right places. Fortunately, Echogarden already has several working VAD implementations.