agents await self.session.generate_reply is not working at all in gemmini realtime

so basically while calling any function tool in gemini realtime voice and i want to use generate_reply to reply before return of the tool but if i use without await than this generate_reply work after the return of the tool reply but if i use the await then it give the error


(env) recro@Manishs-MacBook-Pro voice_agent_livekit_server % python Chef_voice_agent/main.py console
DEBUG:asyncio:Using selector: KqueueSelector
2025-04-27 12:09:01,436 - DEBUG asyncio - Using selector: KqueueSelector 
==================================================
     Livekit Agents - Console
==================================================
Press [Ctrl+B] to toggle between Text/Audio mode, [Q] to quit.

INFO:livekit.agents:starting worker
2025-04-27 12:09:01,437 - INFO livekit.agents - starting worker {"version": "1.0.17", "rtc-version": "1.0.6"}
INFO:livekit.agents:see tracing information at http://localhost:52105/debug
2025-04-27 12:09:01,439 - INFO livekit.agents - see tracing information at http://localhost:52105/debug 
INFO:livekit.agents:initializing job runner
2025-04-27 12:09:01,440 - INFO livekit.agents - initializing job runner {"tid": 17135917}
INFO:livekit.agents:job runner initialized
2025-04-27 12:09:01,440 - INFO livekit.agents - job runner initialized {"tid": 17135917}
DEBUG:asyncio:Using selector: KqueueSelector
2025-04-27 12:09:01,440 - DEBUG asyncio - Using selector: KqueueSelector 
[Debug] GreetingAgent.__init__ called
[Debug] Prompt instructions loaded
DEBUG:livekit.plugins.google:connecting to Gemini Realtime API...
2025-04-27 12:09:01,640 - DEBUG livekit.plugins.google - connecting to Gemini Realtime API... 
[Debug] on_enter start
[Debug] current_stage set to greeting
[Debug] Found recipe: Paneer Butter Masala
[Audio] cBook Pro Microphone [-56.11 dBFS] [######------------------------]INFO:google_genai.live:b'{\n  "setupComplete": {}\n}\n'
2025-04-27 12:09:03,841 - INFO google_genai.live - b'{\n  "setupComplete": {}\n}\n' 
[Audio] cBook Pro Microphone [-66.53 dBFS] [#-----------------------------]DEBUG:livekit.plugins.google:usage metadata
2025-04-27 12:09:10,737 - DEBUG livekit.plugins.google - usage metadata {"usage_metadata": "prompt_token_count=861 cached_content_token_count=None response_token_count=157 tool_use_prompt_token_count=None thoughts_token_count=None total_token_count=1018 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=861)] cache_tokens_details=None response_tokens_details=[ModalityTokenCount(modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=157)] tool_use_prompt_tokens_details=None traffic_type=None"}
[Audio] cBook Pro Microphone [-61.21 dBFS] [####--------------------------][Debug] generate_reply for recipe greeting executed
[Audio] cBook Pro Microphone [-61.99 dBFS] [###---------------------------]DEBUG:livekit.agents:executing tool
2025-04-27 12:09:25,378 - DEBUG livekit.agents - executing tool {"function": "set_servings_and_list_ingredients", "arguments": "{\"servings\": 4}", "speech_id": "speech_8dd1356c2a6a"}
[Debug] set_servings_and_list_ingredients start with servings=4
[Debug] Valid servings received: 4
[Debug] userdata.servings set to 4
[Debug] calculate_ingredients start for 4 servings of Paneer Butter Masala
[Debug] scaling_factor calculated: 1.0
[Debug] Ingredient line added: - 250 g of Paneer (Indian Cottage Cheese)
[Debug] Ingredient line added: - 2 units of Onions (medium sized, finely chopped)
[Debug] Ingredient line added: - 3 units of Tomatoes (large, pureed)
[Debug] ingredient_text built: Okay, for 4 servings of Paneer Butter Masala, you will need:
- 250 g of Paneer (Indian Cottage Cheese)
- 2 units of Onions (medium sized, finely chopped)
- 3 units of Tomatoes (large, pureed)
/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py:1310: RuntimeWarning: coroutine 'RealtimeSession.update_options' was never awaited
  self._rt_session.update_options(tool_choice=model_settings.tool_choice)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
[Audio] cBook Pro Microphone [-62.20 dBFS] [###---------------------------]/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py:1328: RuntimeWarning: coroutine 'RealtimeSession.update_options' was never awaited
  self._rt_session.update_options(tool_choice=ori_tool_choice)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
ERROR:livekit.agents:Error in _realtime_reply_task
Traceback (most recent call last):
  File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py", line 1313, in _realtime_reply_task
    generation_ev = await self._rt_session.generate_reply(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
2025-04-27 12:09:30,382 - ERROR livekit.agents - Error in _realtime_reply_task 
Traceback (most recent call last):
  File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py", line 1313, in _realtime_reply_task
    generation_ev = await self._rt_session.generate_reply(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at /Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')>
Traceback (most recent call last):
  File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py", line 1313, in _realtime_reply_task
    generation_ev = await self._rt_session.generate_reply(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
2025-04-27 12:09:30,383 - ERROR asyncio - Task exception was never retrieved
future: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at /Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')> 
Traceback (most recent call last):
  File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py", line 1313, in _realtime_reply_task
    generation_ev = await self._rt_session.generate_reply(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
[Debug] generate_reply for ingredients executed
[Debug] current_stage updated to handoff_to_prep
DEBUG:livekit.agents:tools execution completed
2025-04-27 12:09:30,383 - DEBUG livekit.agents - tools execution completed {"speech_id": "speech_8dd1356c2a6a"}
[Audio] cBook Pro Microphone [-53.34 dBFS] [#######-----------------------]DEBUG:livekit.plugins.google:usage metadata
2025-04-27 12:09:30,991 - DEBUG livekit.plugins.google - usage metadata {"usage_metadata": "prompt_token_count=1913 cached_content_token_count=None response_token_count=30 tool_use_prompt_token_count=None thoughts_token_count=None total_token_count=1943 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=28), ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=1885)] cache_tokens_details=None response_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=30)] tool_use_prompt_tokens_details=None traffic_type=None"}
[Audio] cBook Pro Microphone [-62.88 dBFS] [###---------------------------]DEBUG:livekit.plugins.google:usage metadata
2025-04-27 12:09:47,822 - DEBUG livekit.plugins.google - usage metadata {"usage_metadata": "prompt_token_count=1058 cached_content_token_count=None response_token_count=400 tool_use_prompt_token_count=None thoughts_token_count=None total_token_count=1458 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=14), ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=1044)] cache_tokens_details=None response_tokens_details=[ModalityTokenCount(modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=400)] tool_use_prompt_tokens_details=None traffic_type=None"}
[Audio] cBook Pro Microphone [-58.30 dBFS] [#####------


from function import load_prompt
import asyncio
import time
from livekit.agents import (
    JobContext,
    WorkerOptions,
    cli,
    Agent,
    AgentSession,
    RunContext,
    function_tool,
    RoomInputOptions
)
import math


class GreetingAgent(Agent):
    def __init__(self):
        print("[Debug] GreetingAgent.__init__ called")
        super().__init__(instructions=load_prompt('greeting_agent_prompt.yaml'))
        print("[Debug] Prompt instructions loaded")

    async def on_enter(self):
        print("[Debug] on_enter start")
        userdata = self.session.userdata
        userdata.current_stage = "greeting"
        print(f"[Debug] current_stage set to {userdata.current_stage}")

        if not userdata.recipe:
            print("[Debug] No recipe found in userdata")
            await self.session.generate_reply(instructions="Say this to user : Hello! I'm your Recipe Chef, but I seem to be missing the recipe details right now.")
            print("[Debug] generate_reply for missing recipe executed")
            return None

        print(f"[Debug] Found recipe: {userdata.recipe.name}")
        await self.session.generate_reply(instructions=f"Say exact line to the user :-  Hello! I'm your Recipe Chef. I'll help you make {userdata.recipe.name}. First, how many servings would you like to prepare?")
        print("[Debug] generate_reply for recipe greeting executed")
        return None

    async def calculate_ingredients(self, recipe, servings: int, session) -> None:
        print(f"[Debug] calculate_ingredients start for {servings} servings of {recipe.name}")
        scaling_factor = servings / recipe.base_servings
        print(f"[Debug] scaling_factor calculated: {scaling_factor}")

        ingredients_list = [f"Okay, for {servings} servings of {recipe.name}, you will need:"]
        for ing in recipe.ingredients:
            scaled_quantity = ing.quantity * scaling_factor
            display_quantity = (
                int(scaled_quantity)
                if scaled_quantity == int(scaled_quantity)
                else round(scaled_quantity, 1)
                if scaled_quantity > 0.1
                else scaled_quantity
            )
            if ing.unit == "units" and display_quantity != int(display_quantity):
                display_quantity = math.ceil(scaled_quantity)

            line = f"- {display_quantity} {ing.unit} of {ing.name}"
            ingredients_list.append(line)
            print(f"[Debug] Ingredient line added: {line}")

        ingredient_text = "\n".join(ingredients_list)
        print(f"[Debug] ingredient_text built: {ingredient_text}")

        await self.session.generate_reply(instructions=f"Say this exact line to user : {ingredient_text}")
        print("[Debug] generate_reply for ingredients executed")
        return None

    @function_tool()
    async def set_servings_and_list_ingredients(self, servings: int, context: RunContext) -> None:
        print(f"[Debug] set_servings_and_list_ingredients start with servings={servings}")
        userdata = self.session.userdata
        session = self.session

        if not userdata.recipe:
            print("[Debug] No recipe data; sending error reply")
            await session.generate_reply(instructions="Say this to user : I seem to have misplaced the recipe details.")
            return "Error: Recipe data missing."

        if servings <= 0:
            print("[Debug] Invalid servings number; sending error reply")
            await self.session.generate_reply(instructions="Say this to user : Please provide a positive number for servings.")
            return "Error: Invalid servings number."

        print(f"[Debug] Valid servings received: {servings}")
        userdata.servings = servings
        print(f"[Debug] userdata.servings set to {userdata.servings}")

        # Calculate and reply with ingredients
        await self.calculate_ingredients(userdata.recipe, servings, session)

        userdata.current_stage = "handoff_to_prep"
        print(f"[Debug] current_stage updated to {userdata.current_stage}")
        return None

Apr 30 '25 07:04 Manish06097

This isn't working for me either on any gemini model

May 02 '25 17:05 ramon-prieto

Having troubles with it too. Couldn't point out any pattern

May 22 '25 10:05 dan-impaq

Not able to use generate_reply inside tool calls with the Gemini Realtime model. It works in the on_enter functions but inside the tool calls I always get this error - future: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at /home/ubuntu/livekit/lk-development-alpha/venv/lib/python3.9/site-packages/livekit/agents/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')> Traceback (most recent call last): File "/home/ubuntu/livekit/lk-development-alpha/venv/lib/python3.9/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) File "/home/ubuntu/livekit/lk-development-alpha/venv/lib/python3.9/site-packages/livekit/agents/voice/agent_activity.py", line 1385, in _realtime_reply_task generation_ev = await self._rt_session.generate_reply( livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.

May 30 '25 03:05 tusheet-geoiq

I am getting the exact same error for realtime Azure OpenAI. Maybe it is an issue when using realtime models

Jun 13 '25 11:06 samaksh-khatri-simform

I'm pretty sure this is mentioned somewhere in the docs or in one of the examples but you need to call session.interrupt() to stop any current generation.

Jun 16 '25 13:06 td2thinh

I'm pretty sure this is mentioned somewhere in the docs or in one of the examples but you need to call session.interrupt() to stop any current generation.

But I don't want to call session.interrupt(), it forces the agent to stop their current phrase abruptly which doesn't sound natural.

Jun 16 '25 16:06 vassiliphilippov

I think it's gemini will wait and automatically reply to the tool outputs, so we cannot use await session.generate_reply() in a function call. related to https://github.com/livekit/agents/issues/2407#issuecomment-2911280026

It doesn't work in a function call but should work in other places.

Jun 17 '25 03:06 longcw

I have the same issue, but when I use azure openai realtime. Using openai realtime without azure seems to not produce this though

Jul 02 '25 22:07 paul-vinogradov

Not working here, any solution?

Aug 01 '25 21:08 MatheusRDG

@MatheusRDG it's a bug in agents 1.2.2, should be fixed in https://github.com/livekit/agents/pull/3017. it will be included in next release soon, for now you can use 1.2.1

Aug 02 '25 01:08 longcw

@longcw Hi,

We’ve tried that, but still — if inside a function we call await generate_reply, we get the same error.

If we don’t await it, we get this intermediate message being said after the agent’s response.

(We are following the RAG example with real-time models, where we respond with “we are on it” while we fetch the response using the Gemini real-time model.)

Aug 04 '25 19:08 dvirginz

@longcw Hi,

We’ve tried that, but still — if inside a function we call await generate_reply, we get the same error.

If we don’t await it, we get this intermediate message being said after the agent’s response.

(We are following the RAG example with real-time models, where we respond with “we are on it” while we fetch the response using the Gemini real-time model.)

what error did you get? and can you try the agents 1.2.3 just released

Aug 05 '25 01:08 longcw

@longcw Hi, Just upgraded to 1.2.3, still we get the same error, here it is:

{"message": "Backend response: [REDACTED_MESSAGE_CONTENT]", "level": "INFO", "name": "[MODULE_PLACEHOLDER]", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:38.989534+00:00"}

2025-08-05 03:28:41.560 - [MODULE_PLACEHOLDER] - ERROR - Error in _realtime_reply_task  
Traceback (most recent call last):  
  File "[PATH]/utils/log.py", line 16, in async_fn_logs  
    return await fn(*args, **kwargs)  
           ^^^^^^^^^^^^^^^^^^^^^^^^^  
  File "[PATH]/voice/agent_activity.py", line 1752, in _realtime_reply_task  
    generation_ev = await self._rt_session.generate_reply(  
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
[MODULE].RealtimeError: generate_reply timed out waiting for generation_created event.

2025-08-05 03:28:41.562 - [MODULE_PLACEHOLDER] - INFO - EndpointLLM response: [REDACTED_MESSAGE_CONTENT]  
2025-08-05 03:28:41.562 - [MODULE_PLACEHOLDER] - DEBUG - tools execution completed  

{"message": "Error in _realtime_reply_task\nTraceback (most recent call last):\n  File \"[PATH]/utils/log.py\", line 16, in async_fn_logs\n    return await fn(*args, **kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"[PATH]/voice/agent_activity.py\", line 1752, in _realtime_reply_task\n    generation_ev = await self._rt_session.generate_reply(\n                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n[MODULE].RealtimeError: generate_reply timed out waiting for generation_created event.", "level": "ERROR", "name": "[MODULE_PLACEHOLDER]", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:41.560897+00:00"}

{"message": "EndpointLLM response: [REDACTED_MESSAGE_CONTENT]", "level": "INFO", "name": "[MODULE_PLACEHOLDER]", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:41.562324+00:00"}

2025-08-05 03:28:41.902 - [MODULE_PLACEHOLDER] - INFO - Participant agent-[AGENT_ID] attributes changed: {'lk.agent.state': 'speaking'}  
{"message": "Participant agent-[AGENT_ID] attributes changed: {'lk.agent.state': 'speaking'}", "level": "INFO", "name": "[MODULE_PLACEHOLDER]", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:41.902925+00:00"}

2025-08-05 03:28:44.497 - asyncio - ERROR - Task exception was never retrieved  
future: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at [PATH]/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')>  
Traceback (most recent call last):  
  File "[PATH]/utils/log.py", line 16, in async_fn_logs  
    return await fn(*args, **kwargs)  
           ^^^^^^^^^^^^^^^^^^^^^^^^^  
  File "[PATH]/voice/agent_activity.py", line 1752, in _realtime_reply_task  
    generation_ev = await self._rt_session.generate_reply(  
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
[MODULE].RealtimeError: generate_reply timed out waiting for generation_created event.

{"message": "Task exception was never retrieved\nfuture: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at [PATH]/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')>\nTraceback (most recent call last):\n  File \"[PATH]/utils/log.py\", line 16, in async_fn_logs\n    return await fn(*args, **kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"[PATH]/voice/agent_activity.py\", line 1752, in _realtime_reply_task\n    generation_ev = await self._rt_session.generate_reply(\n                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n[MODULE].RealtimeError: generate_reply timed out waiting for generation_created event.", "level": "ERROR", "name": "asyncio", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:44.497090+00:00"}

And here is a schematic view of the function call:

@function_tool()
async def handle_interaction(self):
    send_task = asyncio.create_task(self.llm_instance.send_text_to_backend(self.user_input))
    await context.session.generate_reply(instructions=f"Say: {self.stall_message}")

    if self.user_end_message:
        return None

    response = await send_task

    return response

Aug 05 '25 03:08 dvirginz

@dvirginz okay I see, gemini realtime model doesn't support generate_reply in a function call, it's working in BLOCKING mode right now that will wait for a tool output before generate next response.

Aug 05 '25 03:08 longcw

related to https://github.com/livekit/agents/issues/2367

Aug 05 '25 03:08 longcw

Ok, I see. That was the original request by the issue opener, right?

Any idea how we can implement similar logic? Like generating an “I’m on it” response while we fetch the answer?

Aug 05 '25 03:08 dvirginz

Very interested in this functionality with Gemini live as well -- @longcw can you provide more context on your note """ it's working in BLOCKING mode right now that will wait for a tool output before generate next response."""? Is there a way to simulate/fake tool output?

I notice that await self.session.generate_reply() seems to work sometimes in function calls, but fails intermittently with this error.

Aug 05 '25 04:08 lajd

Ok, I see. That was the original request by the issue opener, right?

yes I think so.

Any idea how we can implement similar logic? Like generating an “I’m on it” response while we fetch the answer?

It needs the NON_BLOCKING mode in gemini tool call https://ai.google.dev/gemini-api/docs/live-tools#async-function-calling which is not supported right now, but I'll take a look to see if we can add it in next version.

Aug 05 '25 04:08 longcw

@longcw Thank you!

Aug 05 '25 04:08 dvirginz

@longcw Another flow where this happens is when you generate a reply and then call another generate reply. The same error will happen.

For example: • In the on_enter, generate a long welcome message • Then define a logic for DTMF that triggers another generate_reply

The same error will be raised.

This doesn’t happen in OAI Realtime.

Aug 06 '25 03:08 dvirginz

@longcw This issue happens with Azure OpenAI realtime as well.

Aug 07 '25 12:08 samaksh-khatri-simform

+1 for gemini. Same issue

Aug 19 '25 15:08 lijoabraham

Same problem here

Aug 21 '25 19:08 maxbaluev

gemini live pls fix b0ss

Sep 27 '25 21:09 Larento

Any update on this?

Sep 28 '25 04:09 mercuryyy

we will expose tool_behavior and tool_response_scheduling for gemini realtime API in next release https://github.com/livekit/agents/pull/3482. then you can set tool_behavior="NON_BLOCKING" so the model won't wait for a tool output before a new generation, and you can call generate_reply in a function call.

Sep 28 '25 05:09 longcw

Great solution thank you for the update @longcw

Sep 28 '25 15:09 mercuryyy

It happens to me as well, for Gemini live model gemini-2.5-flash-native-audio-preview-09-2025, with a simple code

    async def on_enter(self):
        self.session.generate_reply(
            instructions="Greet the user kindly."
        )

no other generate_reply functions anywhere. I use 1.2.12 LiveKit Agents with NON_BLOCKING

    session = AgentSession(
        turn_detection=MultilingualModel(),
        preemptive_generation=True,
        llm=google.beta.realtime.RealtimeModel(
            model="gemini-2.5-flash-native-audio-preview-09-2025",
            tool_behavior=types.Behavior.NON_BLOCKING,
            temperature=0.0,
        ),
        userdata=caller,
    )

Any ideas?

Cheers!

Sep 30 '25 20:09 suseek

Ok, funny thing was that I had word greet user in my instructions (it doesn't matter if that's on system or the local one for reply). When I switched that to introduce yourself then it started to work correctly.

Gemini has its own ways I guess.

Sep 30 '25 20:09 suseek

@longcw Will the above fix address the same issue for Azure OpenAI realtime models as well?

I have the same issue, but when I use azure openai realtime. Using openai realtime without azure seems to not produce this though

@paul-vinogradov Were you able to find a solution for AzureOpenAI Realtime? I am facing the same issue as well and just for this, I have to resort to using Azure Speech Service as TTS which is incuring extra costs

Oct 03 '25 05:10 samaksh-khatri-simform

agents agents copied to clipboard

await self.session.generate_reply is not working at all in gemmini realtime

agents
agents copied to clipboard