agents icon indicating copy to clipboard operation
agents copied to clipboard

Livekit multimodal agent gemini skipping function call and hallucinating

Open BhavaniMallapragada opened this issue 8 months ago โ€ข 19 comments

We are experiencing hallucinations. Some times it working perfectly but sometimes it's skipping function calls and hallucinating. Is there any way that we can make it not hallucinate.

BhavaniMallapragada avatar Apr 10 '25 04:04 BhavaniMallapragada

Hey Have you tried with latest version? can you provide more details

jayeshp19 avatar Apr 11 '25 12:04 jayeshp19

Hello,

I'm currently using the following versions:

livekit-plugins-google: 0.11.2

livekit-agents: 0.12.18

Model: gemini-2.0-flash-exp

I'm running a multimodal agent with func_ctx defined for function calling, and have clearly outlined instructions in the system message to guide when to call function.

Initially, everything worked as expected: the model followed instructions and invoked the function. However, after several tests, it started to hallucinate responses.

I noticed that in the Gemini plugin config, we can set the mode: "ANY" to force function calling, but I donโ€™t see any such configuration option available for multimodal agents. Is there a way to force function calling in this setup?

BhavaniMallapragada avatar Apr 12 '25 16:04 BhavaniMallapragada

Hi, I've been facing issues with function calling while using the gemini-live api as well.

I'm not sure if it's the same, but what happens is that the agent sometimes shouts out words like: "tools_output" etc.

library versions I use:

  • livekit-plugins-google==0.11.3
  • livekit-agents==0.12.20

And the function returns a text as an answer but the agent doesn't seem to be recognizing the answer returned and says it didn't get any info. So I explicitly set the string in the chat_ctx and ask it to generate a reply as a work around. But this is quite messy as the bot says it couldn't find anything at first and then responds with the correct answer

Denin-Siby avatar Apr 14 '25 14:04 Denin-Siby

@jayeshp19 can you help me out here?

Denin-Siby avatar Apr 14 '25 16:04 Denin-Siby

@Denin-Siby we are also experienced same it's some times speaks tool_outputs...

BhavaniMallapragada avatar Apr 16 '25 17:04 BhavaniMallapragada

Any Update on this? @jayeshp19 ?

Denin-Siby avatar Apr 21 '25 20:04 Denin-Siby

+1

khantseithu avatar Apr 22 '25 07:04 khantseithu

Any updates? @jayeshp19

More details:

Image

It's saying having technical issues without actually calling the tools, no tool calls in the debug logs. It's just hallucinating. Same tool works fine for openai realtime api.

khantseithu avatar Apr 24 '25 08:04 khantseithu

We are still having issues and such hallucinations. Any update? Does Google knows that issues?

edengby avatar May 09 '25 08:05 edengby

@khantseithu @edengby were you guys using it with Vertex Ai?

Denin-Siby avatar May 09 '25 08:05 Denin-Siby

No. Only gemini live api. It keeps answering his own questions also.

edengby avatar May 09 '25 08:05 edengby

Answering it's own questions might be a problem of not having proper noice cancellation, can you try using it with headphones and check if you see the same issue occurs?

Denin-Siby avatar May 09 '25 08:05 Denin-Siby

AgentSession( llm=google.beta.realtime.RealtimeModel(model="gemini-2.0-flash-exp", voice="Charon",temperature=0.0, top_k=1), turn_detection=EnglishModel(), min_endpointing_delay=0.7, max_endpointing_delay=2.0, vad=silero.VAD.load(activation_threshold=0.7), )

This is how we defined the agent.

edengby avatar May 09 '25 08:05 edengby

while starting the agent session using "agent_session.start" you can specify the room_input_options to have noice cancellation on. Can you try that?

Denin-Siby avatar May 09 '25 08:05 Denin-Siby

this should be fixed in #2247

could anyone provides a repro for this? i'd be happy to confirm

davidzhao avatar May 09 '25 18:05 davidzhao

@davidzhao I'm sorry it's private repo. I cannot. I can provide you any configuration or logs that are needed. It should be fixed at 1.0.20?

edengby avatar May 09 '25 18:05 edengby

@edengby you don't have to share the repo. but please share the agent config as well as steps we could follow to reproduce

davidzhao avatar May 09 '25 20:05 davidzhao

Hi Team, I started an issue on google-genai too, I suspect that this is a model problem from google genai itself. Just to have parallel lookup - have opened an issue.

@edengby @Denin-Siby @BhavaniMallapragada can also share some infos/repro steps in the issue.

https://github.com/googleapis/python-genai/issues/789

cc: @davidzhao

shashwatsanket997 avatar May 09 '25 21:05 shashwatsanket997

session = AgentSession(
            llm=google.beta.realtime.RealtimeModel(model="gemini-2.0-flash-exp", voice="Charon",temperature=0.0, top_k=1),
            turn_detection=EnglishModel(),
            min_endpointing_delay=0.7,
            max_endpointing_delay=2.0,
            vad=silero.VAD.load(activation_threshold=0.7),
        )

room_input = RoomInputOptions()
room_output = RoomOutputOptions(transcription_enabled=True)

await session.start(
            agent=VirtualRealtimeAgent(),
            room=ctx.room,
            room_input_options=room_input,
            room_output_options=room_output,
        )

That's our configuration.

Now, almost every interaction with him I got these hallucinations and answering is own questions.

edengby avatar May 10 '25 07:05 edengby

this is likely model behavior due to temperature=0.0, top_k=1. can you try to reproduce without those flags?

davidzhao avatar May 26 '25 21:05 davidzhao

Worked for me! Thanks. Still having the other issues:

  • From time to time he is saying dot instead of the "." or reading the function that needs to be run.
  • The tone changes, it sounds unnatural, it really damage the experience. Sometimes high pitch then next word is low.

edengby avatar May 28 '25 19:05 edengby

reading the function seems like a model issue.

can you reproduce where it says "dot"? we'd be glad to take a look.

davidzhao avatar May 31 '25 06:05 davidzhao

I was able to reproduce all these issues easily (illustrations, answering his (own) questions): https://github.com/edengby/livekit-google-agent-bugs/

I used Livekit UI and Python agent boilerplates: https://github.com/livekit-examples/voice-pipeline-agent-python https://github.com/livekit-examples/voice-assistant-frontend

On models gemini-1.5-flash and gemini-2.5-flash.

edengby avatar Jun 21 '25 06:06 edengby

Gemini live api saying "dot", after a few minutes of conversation

https://github.com/livekit/agents/issues/2570#issuecomment-3015039647

ayushkumar1610 avatar Jun 28 '25 07:06 ayushkumar1610

We are using gemini-2.5 flash live with the livekit project and the conversation is not stable,sometimes it is working as per the question database , sometimes it is asking random questions , how can we fix this?the behavious is not as expected.

DivyaSingh122 avatar Oct 29 '25 06:10 DivyaSingh122

You can try tuning the temperature and top_k of the LLM. If still happens try same on google aistudio with the same configs. If you couldn't replicate on Aistudio, please share how you're providing questions to llm and configs you've set.

ayushkumar1610 avatar Oct 29 '25 07:10 ayushkumar1610

Thanks, will try this

DivyaSingh122 avatar Oct 29 '25 09:10 DivyaSingh122

since this is a model issue, closing the issue here.

davidzhao avatar Oct 31 '25 05:10 davidzhao