Livekit multimodal agent gemini skipping function call and hallucinating
We are experiencing hallucinations. Some times it working perfectly but sometimes it's skipping function calls and hallucinating. Is there any way that we can make it not hallucinate.
Hey Have you tried with latest version? can you provide more details
Hello,
I'm currently using the following versions:
livekit-plugins-google: 0.11.2
livekit-agents: 0.12.18
Model: gemini-2.0-flash-exp
I'm running a multimodal agent with func_ctx defined for function calling, and have clearly outlined instructions in the system message to guide when to call function.
Initially, everything worked as expected: the model followed instructions and invoked the function. However, after several tests, it started to hallucinate responses.
I noticed that in the Gemini plugin config, we can set the mode: "ANY" to force function calling, but I donโt see any such configuration option available for multimodal agents. Is there a way to force function calling in this setup?
Hi, I've been facing issues with function calling while using the gemini-live api as well.
I'm not sure if it's the same, but what happens is that the agent sometimes shouts out words like: "tools_output" etc.
library versions I use:
- livekit-plugins-google==0.11.3
- livekit-agents==0.12.20
And the function returns a text as an answer but the agent doesn't seem to be recognizing the answer returned and says it didn't get any info. So I explicitly set the string in the chat_ctx and ask it to generate a reply as a work around. But this is quite messy as the bot says it couldn't find anything at first and then responds with the correct answer
@jayeshp19 can you help me out here?
@Denin-Siby we are also experienced same it's some times speaks tool_outputs...
Any Update on this? @jayeshp19 ?
+1
Any updates? @jayeshp19
More details:
It's saying having technical issues without actually calling the tools, no tool calls in the debug logs. It's just hallucinating. Same tool works fine for openai realtime api.
We are still having issues and such hallucinations. Any update? Does Google knows that issues?
@khantseithu @edengby were you guys using it with Vertex Ai?
No. Only gemini live api. It keeps answering his own questions also.
Answering it's own questions might be a problem of not having proper noice cancellation, can you try using it with headphones and check if you see the same issue occurs?
AgentSession( llm=google.beta.realtime.RealtimeModel(model="gemini-2.0-flash-exp", voice="Charon",temperature=0.0, top_k=1), turn_detection=EnglishModel(), min_endpointing_delay=0.7, max_endpointing_delay=2.0, vad=silero.VAD.load(activation_threshold=0.7), )
This is how we defined the agent.
while starting the agent session using "agent_session.start" you can specify the room_input_options to have noice cancellation on. Can you try that?
this should be fixed in #2247
could anyone provides a repro for this? i'd be happy to confirm
@davidzhao I'm sorry it's private repo. I cannot. I can provide you any configuration or logs that are needed. It should be fixed at 1.0.20?
@edengby you don't have to share the repo. but please share the agent config as well as steps we could follow to reproduce
Hi Team, I started an issue on google-genai too, I suspect that this is a model problem from google genai itself. Just to have parallel lookup - have opened an issue.
@edengby @Denin-Siby @BhavaniMallapragada can also share some infos/repro steps in the issue.
https://github.com/googleapis/python-genai/issues/789
cc: @davidzhao
session = AgentSession(
llm=google.beta.realtime.RealtimeModel(model="gemini-2.0-flash-exp", voice="Charon",temperature=0.0, top_k=1),
turn_detection=EnglishModel(),
min_endpointing_delay=0.7,
max_endpointing_delay=2.0,
vad=silero.VAD.load(activation_threshold=0.7),
)
room_input = RoomInputOptions()
room_output = RoomOutputOptions(transcription_enabled=True)
await session.start(
agent=VirtualRealtimeAgent(),
room=ctx.room,
room_input_options=room_input,
room_output_options=room_output,
)
That's our configuration.
Now, almost every interaction with him I got these hallucinations and answering is own questions.
this is likely model behavior due to temperature=0.0, top_k=1. can you try to reproduce without those flags?
Worked for me! Thanks. Still having the other issues:
- From time to time he is saying dot instead of the "." or reading the function that needs to be run.
- The tone changes, it sounds unnatural, it really damage the experience. Sometimes high pitch then next word is low.
reading the function seems like a model issue.
can you reproduce where it says "dot"? we'd be glad to take a look.
I was able to reproduce all these issues easily (illustrations, answering his (own) questions): https://github.com/edengby/livekit-google-agent-bugs/
I used Livekit UI and Python agent boilerplates: https://github.com/livekit-examples/voice-pipeline-agent-python https://github.com/livekit-examples/voice-assistant-frontend
On models gemini-1.5-flash and gemini-2.5-flash.
Gemini live api saying "dot", after a few minutes of conversation
https://github.com/livekit/agents/issues/2570#issuecomment-3015039647
We are using gemini-2.5 flash live with the livekit project and the conversation is not stable,sometimes it is working as per the question database , sometimes it is asking random questions , how can we fix this?the behavious is not as expected.
You can try tuning the temperature and top_k of the LLM. If still happens try same on google aistudio with the same configs. If you couldn't replicate on Aistudio, please share how you're providing questions to llm and configs you've set.
Thanks, will try this
since this is a model issue, closing the issue here.