pipecat icon indicating copy to clipboard operation
pipecat copied to clipboard

Implement LMNT full-duplex TTS generation

Open chadbailey59 opened this issue 10 months ago • 1 comments

LMNT has an interesting approach to 'full-duplex' TTS:

async def main():
  async with Speech() as speech:
    connection = await speech.synthesize_streaming('lily')
    t1 = asyncio.create_task(reader_task(connection))
    t2 = asyncio.create_task(writer_task(connection))
    asyncio.gather([t1, t2])
    connection.close()

This doesn't really work with our FrameProcessor architecture, but it does actually work pretty well with the idea of a custom pipeline. I've started building that out in examples/foundational/02a-async-llm-say-one-thing.py.

I'm currently stuck because I think there's a bug in their implementation. After I get the first chunk of audio back, the websocket connection to LMNT seems to close. I've tried logging just about everywhere I can think of, and I can't find an explanation. I'm going to try and get in touch with the LMNT team to troubleshoot.

(I found out about LMNT through Andy Korman, who I met at our Voice + AI Summit.)

chadbailey59 avatar Mar 26 '24 15:03 chadbailey59

Hi Chad, this is awesome! I patched this and was able to initially repro the issue you describe, but after the issue manigesting on my first test room and connection, it is now working every time.

Do you have specific repro steps that you find surface the issue each time? This would make debugging easier.

A guess is that somehow the way the pipeline logic is set up for a new room, or an edge case on entering a room, is causing the in/out tasks to immediately conclude. Also open to getting on Slack or a VC to discuss more efficiently whenever works for you, FYI @sayitwithai

shaper avatar Mar 26 '24 22:03 shaper

Closing in favor of #391

aconchillo avatar Aug 28 '24 04:08 aconchillo