generative-ai icon indicating copy to clipboard operation
generative-ai copied to clipboard

[Feat]: google tts streaming with async client

Open jayeshp19 opened this issue 8 months ago • 6 comments

Saw this example using Google TTS streaming with Chirp3. I'm trying to do something similar with the async client but running into issues. Only the first sentence plays from iterator

Here’s a minimal repro:

import asyncio

from google.cloud import texttospeech


async def process_streaming_synthesis():
    client = texttospeech.TextToSpeechAsyncClient()

    streaming_config = texttospeech.StreamingSynthesizeConfig(
        voice=texttospeech.VoiceSelectionParams(
            name="en-US-Chirp3-HD-Charon",
            language_code="en-US",
        ),
        streaming_audio_config=texttospeech.StreamingAudioConfig(
            audio_encoding=texttospeech.AudioEncoding.OGG_OPUS
        ),
    )

    config_request = texttospeech.StreamingSynthesizeRequest(streaming_config=streaming_config)

    text_iterator = [
        "Hello there.",
        "How are you today?",
        "It's such nice weather outside.",
    ]

    async def request_generator():
        yield config_request
        for text in text_iterator:
            await asyncio.sleep(0)  # yield control to event loop
            yield texttospeech.StreamingSynthesizeRequest(
                input=texttospeech.StreamingSynthesisInput(text=text)
            )

    with open("output.ogg", "wb") as audio_file:
        stream = await client.streaming_synthesize(request_generator())
        async for response in stream:
            audio_file.write(response.audio_content)

    print("Complete audio saved to output.ogg")


def main():
    asyncio.run(process_streaming_synthesis())


if __name__ == "__main__":
    main()

Would really appreciate any pointers or working example. Thankss!

PTAL: @inardini @holtskinner

Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

jayeshp19 avatar Apr 02 '25 14:04 jayeshp19

Bumping this @holtskinner @inardini

jayeshp19 avatar Apr 03 '25 18:04 jayeshp19

@holtskinner @inardini - Just spoke to Google engineering internally and it seems this fix will cut down latency from ~800ms to ~200 ms. Can you please help who can expedite this resolution?

manishkjs1 avatar Apr 25 '25 16:04 manishkjs1

@jayeshp19 - As per engg, this is a known issue internally when you use an audio_encoding other than PCM in TTS streaming.

Can you please remove OGG_OPUS and use PCM please?

manishkjs1 avatar Apr 25 '25 17:04 manishkjs1

@manishkjs1 confirmed this has been fixed. feel free to close the issue now

davidzhao avatar Apr 27 '25 00:04 davidzhao

Thanks @davidzhao , but I think there is a little confusion here.

This issue is to rather introduce {StreamingSynthesizeRequest} capability in tts.py module of Google plugin, I think that is still pending. Would @jayeshp19 continue to work on it?

manishkjs1 avatar Apr 27 '25 07:04 manishkjs1

Closing this one as Google supports streaming with PCM encoding. Thanks @manishkjs1

jayeshp19 avatar May 02 '25 11:05 jayeshp19