Deepgram’s WebSocket TTS

Open Ahmer967 opened this issue 8 months ago • 1 comments

Problem Statement

I'm experiencing a 3 to 6-second delay between the user's last spoken word and the bot's first spoken response. I've looked into Deepgram's latest WebSocket TTS feature, which may help reduce latency in real-time speech generation. I'm currently using Twilio, Deepgram's TTS and STT, and the OpenAI gpt-4o model to generate responses. The application is deployed on an EC2 instance of type c6i.x2large.

Proposed Solution

Implement Deepgram's WebSocket TTS service to reduce the delay in converting text to speech.

Alternative Solutions

No response

Additional Context

No response

Would you be willing to help implement this feature?

[ ] Yes, I'd like to contribute
[ ] No, I'm just suggesting

Apr 29 '25 16:04 Ahmer967

Deepgram's websocket API does not support interruptions, so we would have to break the websocket connection any time the user interrupts the bot. We've seen poor performance when tearing down the websocket. For that reason, we will wait on implementing the websocket version until this is supported.

May 06 '25 15:05 markbackman