agents icon indicating copy to clipboard operation
agents copied to clipboard

Fix bug with before_tts_cb when speech is string but before_tts_cb returns async iterable

Open martin-purplefish opened this issue 1 year ago • 3 comments

If you said agent.say(), but your before_tts_cb returned an AsyncIterable, it'd throw - because the transcript was a string but the before_tts_cb was AsyncIterable.

martin-purplefish avatar Oct 12 '24 21:10 martin-purplefish

🦋 Changeset detected

Latest commit: 7300f54d74bc20ebd4d74b7ac404dbb11f18344f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
livekit-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

changeset-bot[bot] avatar Oct 12 '24 21:10 changeset-bot[bot]

PTAL @davidzhao @theomonnom - ran into this trying to do some fancy stuff with the LLM responses :D

martin-purplefish avatar Oct 12 '24 21:10 martin-purplefish

Alright cleaned it up a bunch. Ready for review

martin-purplefish avatar Oct 13 '24 20:10 martin-purplefish

@davidzhao @theomonnom

martin-purplefish avatar Oct 14 '24 16:10 martin-purplefish

Hey, I think we still want to use the "synthesize" method instead of the "stream" when the input is a string? wdyt?

I'm just not sure about this part:

async def _str_to_aiter(s: str) -> AsyncIterable[str]:
    yield s

theomonnom avatar Oct 14 '24 19:10 theomonnom

Hey, I think we still want to use the "synthesize" method instead of the "stream" when the input is a string? wdyt?

I'm just not sure about this part:

async def _str_to_aiter(s: str) -> AsyncIterable[str]:
    yield s

I think we still need it. This way we could treat the string as a stream that yields just that token, and just have the stream_synthesis_task instead of also str_synthesis_task. Otherwise we have to deal with 4 different cases of tts_source and transcript_source being str or iterable (all of the permutations are possible depending on how the user calls the LLM and what types before_tts_cb returns)

martin-purplefish avatar Oct 14 '24 19:10 martin-purplefish

By the synthesize method, I mean the TTS.synthesize

theomonnom avatar Oct 14 '24 20:10 theomonnom

By the synthesize method, I mean the TTS.synthesize

Oh I see, there's another wrinkle :D

How about this - we stream if it supports it, otherwise we throw if both the tts_source and the transcript_source aren't strings?

martin-purplefish avatar Oct 14 '24 20:10 martin-purplefish

What do you think of this @theomonnom ?

martin-purplefish avatar Oct 14 '24 21:10 martin-purplefish

Hey, I think this makes sense, what do you think about always following what is the type of tts_source. And even if the transcript_source is an AsyncIterable, we can still create a task that listen to it inside the str_synthesis_task

theomonnom avatar Oct 14 '24 23:10 theomonnom

Got it, yeah I can do that.

martin-purplefish avatar Oct 14 '24 23:10 martin-purplefish

@theomonnom ptal

martin-purplefish avatar Oct 14 '24 23:10 martin-purplefish