pipecat icon indicating copy to clipboard operation
pipecat copied to clipboard

Streaming implementation for Google TTS (Chirp3 HD)

Open manishkjs1 opened this issue 7 months ago • 6 comments

Problem Statement

I am part of GCP sales engineering team and wanted to request integration of streaming module in TTS (Chirp3HD) code.

Right now latency in batch mode is way high for any reasonable voice to voice conversation.

Proposed Solution

The exact module is StreamingSynthesizeRequest and API details are here.

The TTS latency in streaming mode (~200ms) is way better than in batch mode (~800ms).

Alternative Solutions

No response

Additional Context

I have had got done similar update in Livekit and it works there. I would love to create a PR for this and contribute.

Would you be willing to help implement this feature?

  • [x] Yes, I'd like to contribute
  • [ ] No, I'm just suggesting

manishkjs1 avatar May 14 '25 17:05 manishkjs1

@markbackman @aconchillo - FYR please

manishkjs1 avatar May 14 '25 17:05 manishkjs1

Hello, I wanted to try this out and implemented the change. It is very fast indeed!

  • I noticed slight quality degradation (everything else unchanged). Do you have an idea what could cause this? Sample rate is 24000.
  • StreamingAudioConfig should support speaking_rate, but it says the parameter is unknown, maybe Pipecat uses an older version? Thank you!

aristid avatar May 19 '25 17:05 aristid

Ach, I see, StreamingSynthesisInput doesn't support SSML, so we should keep the non-streaming version for SSML and older voices.

aristid avatar May 19 '25 20:05 aristid

@aristid one option is to keep a streaming version and non-streaming version. We have other examples of this. For example CartesiaTTSService is the websocket API service and CartesiaHttpTTSService is the HTTP API service. We could follow a similar pattern for GoogleTTSService. That would be GoogleTTSService for streaming and GoogleHttpTTSService for non-streaming (or some other concise but descriptive name).

markbackman avatar May 20 '25 19:05 markbackman

I have it working now, should I make a pull request? GoogleStreamingTTSService, to keep the http version working? It really is much faster, making using standard Gemini feasible instead of Gemini Live. The quality is a bit worse though, hope @manishkjs1 can say something about that.

aristid avatar May 20 '25 21:05 aristid

should I make a pull request?

Yes, please! 🙌

GoogleStreamingTTSService, to keep the http version working? It really is much faster, making using standard Gemini feasible instead of Gemini Live. The quality is a bit worse though, hope @manishkjs1 can say something about that.

Our convention has been:

  • f"{service_name}TTSService" for the primary implementation, e.g. GoogleTTSService
  • f"{service_name}{description}TTSService" for the second implementation. To date, the description has been Http, so GoogleHttpTTSService

We can denote the change in the changelog.

markbackman avatar May 21 '25 19:05 markbackman

what about the STT of the chirp 3?

idotr7 avatar Sep 12 '25 06:09 idotr7