Feature Request: Add voice chat mode with interruption support
I'd like to request support for a voice chat mode in SurfSense where the user can speak a prompt and receive an audio response back, enabling full speech-to-speech interaction.
My suggestions for this feature:
Add a real-time voice chat mode (speech-to-speech). Include support for user-defined stop words (such as "stop" or "cancel") that can interrupt TTS playback while it's speaking, to allow more natural and hands-free interaction. Ideally, let users toggle this mode on or off through the UI or config.
One possible implementation path would be to leverage WebRTC for capturing and streaming audio. WebRTC includes built-in Voice Activity Detection (VAD), which can be used to automatically detect when the user starts or stops speaking, enabling natural interruption of TTS and seamless hands-free interaction. In addition to VAD, WebRTC also provides support for low-latency audio transmission, echo cancellation, noise suppression, automatic gain control, and cross-platform compatibility across browsers and mobile environments. These features make it a strong candidate for implementing a responsive and privacy-preserving voice chat mode, especially when combined with local LLM and TTS/STT components.
Although webrtc detects user speech, noisy environments can trigger a stop in TTS so the option of stop words is still useful.
I believe Livekit is using webrtc
Looking forward to seeing this added.
Can i work on this issue?
@iamsyg Sure. Thanks for your interest 👍
Please could you label this issue as Hacktoberfest?
@iamsyg Done 👍
@MODSetter Thank you