agents
agents copied to clipboard
Fix bug with before_tts_cb when speech is string but before_tts_cb returns async iterable
If you said agent.say(), but your before_tts_cb returned an AsyncIterable, it'd throw - because the transcript was a string but the before_tts_cb was AsyncIterable.
🦋 Changeset detected
Latest commit: 7300f54d74bc20ebd4d74b7ac404dbb11f18344f
The changes in this PR will be included in the next version bump.
This PR includes changesets to release 1 package
| Name | Type |
|---|---|
| livekit-agents | Patch |
Not sure what this means? Click here to learn what changesets are.
Click here if you're a maintainer who wants to add another changeset to this PR
PTAL @davidzhao @theomonnom - ran into this trying to do some fancy stuff with the LLM responses :D
Alright cleaned it up a bunch. Ready for review
@davidzhao @theomonnom
Hey, I think we still want to use the "synthesize" method instead of the "stream" when the input is a string? wdyt?
I'm just not sure about this part:
async def _str_to_aiter(s: str) -> AsyncIterable[str]:
yield s
Hey, I think we still want to use the "synthesize" method instead of the "stream" when the input is a string? wdyt?
I'm just not sure about this part:
async def _str_to_aiter(s: str) -> AsyncIterable[str]: yield s
I think we still need it. This way we could treat the string as a stream that yields just that token, and just have the stream_synthesis_task instead of also str_synthesis_task. Otherwise we have to deal with 4 different cases of tts_source and transcript_source being str or iterable (all of the permutations are possible depending on how the user calls the LLM and what types before_tts_cb returns)
By the synthesize method, I mean the TTS.synthesize
By the synthesize method, I mean the TTS.synthesize
Oh I see, there's another wrinkle :D
How about this - we stream if it supports it, otherwise we throw if both the tts_source and the transcript_source aren't strings?
What do you think of this @theomonnom ?
Hey, I think this makes sense, what do you think about always following what is the type of tts_source. And even if the transcript_source is an AsyncIterable, we can still create a task that listen to it inside the str_synthesis_task
Got it, yeah I can do that.
@theomonnom ptal