Resemble's Websocket TTS integration with Pipecat
Add Resemble AI TTS Integration
This PR adds a new ResembleTTSService class that integrates Resemble AI's streaming TTS API with Pipecat.
Key Features
- Real-time streaming of TTS audio using Resemble AI's WebSocket API
- Support for:
- Custom voice selection via UUID
- Configurable sample rate (default 48kHz)
- PCM 16-bit audio format
- Proper frame handling including:
-
TTSStartedFrame -
TTSAudioRawFrame -
TTSStoppedFrame -
ErrorFramefor error cases
-
- Metrics tracking (TTFB and usage metrics)
- Clean connection handling and resource cleanup
Implementation Details
- Uses
websocketslibrary for WebSocket communication - Handles base64-encoded audio content from API
- Includes proper error handling for:
- Connection issues
- API errors
- Unexpected disconnects
- Follows Pipecat's service pattern with async generators
Usage Example
tts = ResembleTTSService(
api_key="your_api_key",
voice_uuid="your_voice_uuid",
sample_rate=48000 # optional
)
Docs: https://docs.app.resemble.ai/docs/text_to_speech/streaming_websocket
@markbackman this is ready for approval!
Hey @krishvadhani19 I just set up an example with this and unfortunately, it doesn't run. Does this work for you?
Also, in comparing to the docs, I see a sample rate of 48khz is used, but that's not a supported sample rate.
UPDATE: Hmm, it looks like the websocket services requires a $699/mo business plan. Maybe that's my issue.
For us to accept a submission, it would have to mimic the other TTS services. A good one to look at would be CartesiaTTSService. The only difference between Resemble.ai and Cartesia is that it appears that Resemble doesn't support word/timestamp pairs.
It might support context_ids via the request_id feature, which would be required for interruptions to work properly, so that should be implemented too. If you're interested in working on that, it would be great.
Also, you should:
- Create an example similar to 07-interruptible.py for testing.
- Add
@traced_ttsto therun_ttsmethod to enable tracing. - Add any optional dependencies to pyproject.toml under the key
resemble. This looks likewebsocketsis needed. - Update dot-env.template with keys required
- Add a CHANGELOG entry
@krishvadhani19 can you reply, otherwise I'll close this out due to inactivity.
hi @markbackman I will make changes to the PR. Apologies for the delay.
hi @markbackman made all above mentioned changes, ready for approval!
@krishvadhani19 does this require a Business Plan ($699/mo) to test?
From the docs, I see:
Note: Websocket API is only available for Business plan users. If you're running into trouble, upgrade to a Business plan or higher on the billing page.
@krishvadhani19 I just removed your message as it contained a key. You might want to rotate it. If you want to share a key, that would be great, but it might be better done via a Discord DM. You can find me on the Pipecat Discord as MarkAtDaily.
@krishvadhani19 Sorry for so many comments! There's a lot to take into account for building a TTS service.
One more question: do you have an idle timeout (e.g. disconnect the websocket after N seconds of not input)? If so, you'll need to implement a keepalive function.
I haven't tried to run the code yet, but after you clean up these comments, I'll give it a go! I'll provide more timely feedback next time :)
This has been stale for a while - can we expect any updates soon? @krishvadhani19
Hi team, Apologies for the delay, got stuck with few releases and urgent demos.
Thank you for the patience. I will get it all fixed by this week.
Very interesting! Planning to launch this week?