Deepgram’s WebSocket TTS
Problem Statement
I'm experiencing a 3 to 6-second delay between the user's last spoken word and the bot's first spoken response. I've looked into Deepgram's latest WebSocket TTS feature, which may help reduce latency in real-time speech generation. I'm currently using Twilio, Deepgram's TTS and STT, and the OpenAI gpt-4o model to generate responses. The application is deployed on an EC2 instance of type c6i.x2large.
Proposed Solution
Implement Deepgram's WebSocket TTS service to reduce the delay in converting text to speech.
Alternative Solutions
No response
Additional Context
No response
Would you be willing to help implement this feature?
- [ ] Yes, I'd like to contribute
- [ ] No, I'm just suggesting
Deepgram's websocket API does not support interruptions, so we would have to break the websocket connection any time the user interrupts the bot. We've seen poor performance when tearing down the websocket. For that reason, we will wait on implementing the websocket version until this is supported.