agents icon indicating copy to clipboard operation
agents copied to clipboard

[Very Important for LiveKit] Request add general mechanism to customize plugins of VoiceAssistant ASAP

Open taylorgwei opened this issue 4 months ago • 4 comments

Livekit bring very good RTC to world with OpenSource or Cloud, Awesome! But Livekit Agent has one big problem:

The Livekit' VoiceAssistant ' Pipeline are hardcoded as combining VAD+STT+LLM+TTS ,which is pretty hard to customize it or bring a lot problems if everyone want to add extra/remove some plugin into the pipeline. The following cases are FAILED at 100%: if just need VAD+STT+LLM(remove TTS) ,VoiceAssistant may crash if remove VAD plugin(but keep others), VoiceAssistant may crash if just organized like VAD + Multimodal, VoiceAssistant may crash if want to extra process after TTS, there are no way insert a plugin into pipeline at end if want to customize chat_ctx dynamically ,it is complexity with a lot of code change

The related hard-code are here : assistant = VoiceAssistant( vad=ctx.proc.userdata["vad"], stt=deepgram.STT(), llm=openai.LLM(), tts=openai.TTS(), chat_ctx=initial_ctx, )

Result: right now Agent'Framework are good for Demo but not for product because every customer have very specific demands ,which ask general and easy way to customize flow.

Expect: VoiceAssistant should be a general pipeline framework, just manage data flow(txt,voice) between plugins and connect every plugin to finish a task. NOT depends type/purpose of plugin or how plugin work, NOT matter how plugin-inside logic

BTW: Latest version seems to be better than before by spliting VoiceAssistant to Pipeline concept ,but still hardcode inside.

taylorgwei avatar Oct 02 '24 16:10 taylorgwei