pipecat
pipecat copied to clipboard
Implement LMNT full-duplex TTS generation
LMNT has an interesting approach to 'full-duplex' TTS:
async def main():
async with Speech() as speech:
connection = await speech.synthesize_streaming('lily')
t1 = asyncio.create_task(reader_task(connection))
t2 = asyncio.create_task(writer_task(connection))
asyncio.gather([t1, t2])
connection.close()
This doesn't really work with our FrameProcessor
architecture, but it does actually work pretty well with the idea of a custom pipeline. I've started building that out in examples/foundational/02a-async-llm-say-one-thing.py
.
I'm currently stuck because I think there's a bug in their implementation. After I get the first chunk of audio back, the websocket connection to LMNT seems to close. I've tried logging just about everywhere I can think of, and I can't find an explanation. I'm going to try and get in touch with the LMNT team to troubleshoot.
(I found out about LMNT through Andy Korman, who I met at our Voice + AI Summit.)
Hi Chad, this is awesome! I patched this and was able to initially repro the issue you describe, but after the issue manigesting on my first test room and connection, it is now working every time.
Do you have specific repro steps that you find surface the issue each time? This would make debugging easier.
A guess is that somehow the way the pipeline logic is set up for a new room, or an edge case on entering a room, is causing the in/out tasks to immediately conclude. Also open to getting on Slack or a VC to discuss more efficiently whenever works for you, FYI @sayitwithai
Closing in favor of #391