agents
agents copied to clipboard
await self.session.generate_reply is not working at all in gemmini realtime
so basically while calling any function tool in gemini realtime voice and i want to use generate_reply to reply before return of the tool but if i use without await than this generate_reply work after the return of the tool reply but if i use the await then it give the error
(env) recro@Manishs-MacBook-Pro voice_agent_livekit_server % python Chef_voice_agent/main.py console
DEBUG:asyncio:Using selector: KqueueSelector
2025-04-27 12:09:01,436 - DEBUG asyncio - Using selector: KqueueSelector
==================================================
Livekit Agents - Console
==================================================
Press [Ctrl+B] to toggle between Text/Audio mode, [Q] to quit.
INFO:livekit.agents:starting worker
2025-04-27 12:09:01,437 - INFO livekit.agents - starting worker {"version": "1.0.17", "rtc-version": "1.0.6"}
INFO:livekit.agents:see tracing information at http://localhost:52105/debug
2025-04-27 12:09:01,439 - INFO livekit.agents - see tracing information at http://localhost:52105/debug
INFO:livekit.agents:initializing job runner
2025-04-27 12:09:01,440 - INFO livekit.agents - initializing job runner {"tid": 17135917}
INFO:livekit.agents:job runner initialized
2025-04-27 12:09:01,440 - INFO livekit.agents - job runner initialized {"tid": 17135917}
DEBUG:asyncio:Using selector: KqueueSelector
2025-04-27 12:09:01,440 - DEBUG asyncio - Using selector: KqueueSelector
[Debug] GreetingAgent.__init__ called
[Debug] Prompt instructions loaded
DEBUG:livekit.plugins.google:connecting to Gemini Realtime API...
2025-04-27 12:09:01,640 - DEBUG livekit.plugins.google - connecting to Gemini Realtime API...
[Debug] on_enter start
[Debug] current_stage set to greeting
[Debug] Found recipe: Paneer Butter Masala
[Audio] cBook Pro Microphone [-56.11 dBFS] [######------------------------]INFO:google_genai.live:b'{\n "setupComplete": {}\n}\n'
2025-04-27 12:09:03,841 - INFO google_genai.live - b'{\n "setupComplete": {}\n}\n'
[Audio] cBook Pro Microphone [-66.53 dBFS] [#-----------------------------]DEBUG:livekit.plugins.google:usage metadata
2025-04-27 12:09:10,737 - DEBUG livekit.plugins.google - usage metadata {"usage_metadata": "prompt_token_count=861 cached_content_token_count=None response_token_count=157 tool_use_prompt_token_count=None thoughts_token_count=None total_token_count=1018 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=861)] cache_tokens_details=None response_tokens_details=[ModalityTokenCount(modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=157)] tool_use_prompt_tokens_details=None traffic_type=None"}
[Audio] cBook Pro Microphone [-61.21 dBFS] [####--------------------------][Debug] generate_reply for recipe greeting executed
[Audio] cBook Pro Microphone [-61.99 dBFS] [###---------------------------]DEBUG:livekit.agents:executing tool
2025-04-27 12:09:25,378 - DEBUG livekit.agents - executing tool {"function": "set_servings_and_list_ingredients", "arguments": "{\"servings\": 4}", "speech_id": "speech_8dd1356c2a6a"}
[Debug] set_servings_and_list_ingredients start with servings=4
[Debug] Valid servings received: 4
[Debug] userdata.servings set to 4
[Debug] calculate_ingredients start for 4 servings of Paneer Butter Masala
[Debug] scaling_factor calculated: 1.0
[Debug] Ingredient line added: - 250 g of Paneer (Indian Cottage Cheese)
[Debug] Ingredient line added: - 2 units of Onions (medium sized, finely chopped)
[Debug] Ingredient line added: - 3 units of Tomatoes (large, pureed)
[Debug] ingredient_text built: Okay, for 4 servings of Paneer Butter Masala, you will need:
- 250 g of Paneer (Indian Cottage Cheese)
- 2 units of Onions (medium sized, finely chopped)
- 3 units of Tomatoes (large, pureed)
/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py:1310: RuntimeWarning: coroutine 'RealtimeSession.update_options' was never awaited
self._rt_session.update_options(tool_choice=model_settings.tool_choice)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
[Audio] cBook Pro Microphone [-62.20 dBFS] [###---------------------------]/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py:1328: RuntimeWarning: coroutine 'RealtimeSession.update_options' was never awaited
self._rt_session.update_options(tool_choice=ori_tool_choice)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
ERROR:livekit.agents:Error in _realtime_reply_task
Traceback (most recent call last):
File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py", line 1313, in _realtime_reply_task
generation_ev = await self._rt_session.generate_reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
2025-04-27 12:09:30,382 - ERROR livekit.agents - Error in _realtime_reply_task
Traceback (most recent call last):
File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py", line 1313, in _realtime_reply_task
generation_ev = await self._rt_session.generate_reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at /Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')>
Traceback (most recent call last):
File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py", line 1313, in _realtime_reply_task
generation_ev = await self._rt_session.generate_reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
2025-04-27 12:09:30,383 - ERROR asyncio - Task exception was never retrieved
future: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at /Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')>
Traceback (most recent call last):
File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/recro/Documents/voice_agent_livekit_server/env/lib/python3.11/site-packages/livekit/agents/voice/agent_activity.py", line 1313, in _realtime_reply_task
generation_ev = await self._rt_session.generate_reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
[Debug] generate_reply for ingredients executed
[Debug] current_stage updated to handoff_to_prep
DEBUG:livekit.agents:tools execution completed
2025-04-27 12:09:30,383 - DEBUG livekit.agents - tools execution completed {"speech_id": "speech_8dd1356c2a6a"}
[Audio] cBook Pro Microphone [-53.34 dBFS] [#######-----------------------]DEBUG:livekit.plugins.google:usage metadata
2025-04-27 12:09:30,991 - DEBUG livekit.plugins.google - usage metadata {"usage_metadata": "prompt_token_count=1913 cached_content_token_count=None response_token_count=30 tool_use_prompt_token_count=None thoughts_token_count=None total_token_count=1943 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=28), ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=1885)] cache_tokens_details=None response_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=30)] tool_use_prompt_tokens_details=None traffic_type=None"}
[Audio] cBook Pro Microphone [-62.88 dBFS] [###---------------------------]DEBUG:livekit.plugins.google:usage metadata
2025-04-27 12:09:47,822 - DEBUG livekit.plugins.google - usage metadata {"usage_metadata": "prompt_token_count=1058 cached_content_token_count=None response_token_count=400 tool_use_prompt_token_count=None thoughts_token_count=None total_token_count=1458 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=14), ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=1044)] cache_tokens_details=None response_tokens_details=[ModalityTokenCount(modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=400)] tool_use_prompt_tokens_details=None traffic_type=None"}
[Audio] cBook Pro Microphone [-58.30 dBFS] [#####------
from function import load_prompt
import asyncio
import time
from livekit.agents import (
JobContext,
WorkerOptions,
cli,
Agent,
AgentSession,
RunContext,
function_tool,
RoomInputOptions
)
import math
class GreetingAgent(Agent):
def __init__(self):
print("[Debug] GreetingAgent.__init__ called")
super().__init__(instructions=load_prompt('greeting_agent_prompt.yaml'))
print("[Debug] Prompt instructions loaded")
async def on_enter(self):
print("[Debug] on_enter start")
userdata = self.session.userdata
userdata.current_stage = "greeting"
print(f"[Debug] current_stage set to {userdata.current_stage}")
if not userdata.recipe:
print("[Debug] No recipe found in userdata")
await self.session.generate_reply(instructions="Say this to user : Hello! I'm your Recipe Chef, but I seem to be missing the recipe details right now.")
print("[Debug] generate_reply for missing recipe executed")
return None
print(f"[Debug] Found recipe: {userdata.recipe.name}")
await self.session.generate_reply(instructions=f"Say exact line to the user :- Hello! I'm your Recipe Chef. I'll help you make {userdata.recipe.name}. First, how many servings would you like to prepare?")
print("[Debug] generate_reply for recipe greeting executed")
return None
async def calculate_ingredients(self, recipe, servings: int, session) -> None:
print(f"[Debug] calculate_ingredients start for {servings} servings of {recipe.name}")
scaling_factor = servings / recipe.base_servings
print(f"[Debug] scaling_factor calculated: {scaling_factor}")
ingredients_list = [f"Okay, for {servings} servings of {recipe.name}, you will need:"]
for ing in recipe.ingredients:
scaled_quantity = ing.quantity * scaling_factor
display_quantity = (
int(scaled_quantity)
if scaled_quantity == int(scaled_quantity)
else round(scaled_quantity, 1)
if scaled_quantity > 0.1
else scaled_quantity
)
if ing.unit == "units" and display_quantity != int(display_quantity):
display_quantity = math.ceil(scaled_quantity)
line = f"- {display_quantity} {ing.unit} of {ing.name}"
ingredients_list.append(line)
print(f"[Debug] Ingredient line added: {line}")
ingredient_text = "\n".join(ingredients_list)
print(f"[Debug] ingredient_text built: {ingredient_text}")
await self.session.generate_reply(instructions=f"Say this exact line to user : {ingredient_text}")
print("[Debug] generate_reply for ingredients executed")
return None
@function_tool()
async def set_servings_and_list_ingredients(self, servings: int, context: RunContext) -> None:
print(f"[Debug] set_servings_and_list_ingredients start with servings={servings}")
userdata = self.session.userdata
session = self.session
if not userdata.recipe:
print("[Debug] No recipe data; sending error reply")
await session.generate_reply(instructions="Say this to user : I seem to have misplaced the recipe details.")
return "Error: Recipe data missing."
if servings <= 0:
print("[Debug] Invalid servings number; sending error reply")
await self.session.generate_reply(instructions="Say this to user : Please provide a positive number for servings.")
return "Error: Invalid servings number."
print(f"[Debug] Valid servings received: {servings}")
userdata.servings = servings
print(f"[Debug] userdata.servings set to {userdata.servings}")
# Calculate and reply with ingredients
await self.calculate_ingredients(userdata.recipe, servings, session)
userdata.current_stage = "handoff_to_prep"
print(f"[Debug] current_stage updated to {userdata.current_stage}")
return None
This isn't working for me either on any gemini model
Having troubles with it too. Couldn't point out any pattern
Not able to use generate_reply inside tool calls with the Gemini Realtime model. It works in the on_enter functions but inside the tool calls I always get this error - future: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at /home/ubuntu/livekit/lk-development-alpha/venv/lib/python3.9/site-packages/livekit/agents/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')> Traceback (most recent call last): File "/home/ubuntu/livekit/lk-development-alpha/venv/lib/python3.9/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) File "/home/ubuntu/livekit/lk-development-alpha/venv/lib/python3.9/site-packages/livekit/agents/voice/agent_activity.py", line 1385, in _realtime_reply_task generation_ev = await self._rt_session.generate_reply( livekit.agents.llm.realtime.RealtimeError: generate_reply timed out waiting for generation_created event.
I am getting the exact same error for realtime Azure OpenAI. Maybe it is an issue when using realtime models
I'm pretty sure this is mentioned somewhere in the docs or in one of the examples but you need to call session.interrupt() to stop any current generation.
I'm pretty sure this is mentioned somewhere in the docs or in one of the examples but you need to call
session.interrupt()to stop any current generation.
But I don't want to call session.interrupt(), it forces the agent to stop their current phrase abruptly which doesn't sound natural.
I think it's gemini will wait and automatically reply to the tool outputs, so we cannot use await session.generate_reply() in a function call. related to https://github.com/livekit/agents/issues/2407#issuecomment-2911280026
It doesn't work in a function call but should work in other places.
I have the same issue, but when I use azure openai realtime. Using openai realtime without azure seems to not produce this though
Not working here, any solution?
@MatheusRDG it's a bug in agents 1.2.2, should be fixed in https://github.com/livekit/agents/pull/3017. it will be included in next release soon, for now you can use 1.2.1
@longcw Hi,
We’ve tried that, but still — if inside a function we call await generate_reply, we get the same error.
If we don’t await it, we get this intermediate message being said after the agent’s response.
(We are following the RAG example with real-time models, where we respond with “we are on it” while we fetch the response using the Gemini real-time model.)
@longcw Hi,
We’ve tried that, but still — if inside a function we call await generate_reply, we get the same error.
If we don’t await it, we get this intermediate message being said after the agent’s response.
(We are following the RAG example with real-time models, where we respond with “we are on it” while we fetch the response using the Gemini real-time model.)
what error did you get? and can you try the agents 1.2.3 just released
@longcw Hi, Just upgraded to 1.2.3, still we get the same error, here it is:
{"message": "Backend response: [REDACTED_MESSAGE_CONTENT]", "level": "INFO", "name": "[MODULE_PLACEHOLDER]", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:38.989534+00:00"}
2025-08-05 03:28:41.560 - [MODULE_PLACEHOLDER] - ERROR - Error in _realtime_reply_task
Traceback (most recent call last):
File "[PATH]/utils/log.py", line 16, in async_fn_logs
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "[PATH]/voice/agent_activity.py", line 1752, in _realtime_reply_task
generation_ev = await self._rt_session.generate_reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[MODULE].RealtimeError: generate_reply timed out waiting for generation_created event.
2025-08-05 03:28:41.562 - [MODULE_PLACEHOLDER] - INFO - EndpointLLM response: [REDACTED_MESSAGE_CONTENT]
2025-08-05 03:28:41.562 - [MODULE_PLACEHOLDER] - DEBUG - tools execution completed
{"message": "Error in _realtime_reply_task\nTraceback (most recent call last):\n File \"[PATH]/utils/log.py\", line 16, in async_fn_logs\n return await fn(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"[PATH]/voice/agent_activity.py\", line 1752, in _realtime_reply_task\n generation_ev = await self._rt_session.generate_reply(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n[MODULE].RealtimeError: generate_reply timed out waiting for generation_created event.", "level": "ERROR", "name": "[MODULE_PLACEHOLDER]", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:41.560897+00:00"}
{"message": "EndpointLLM response: [REDACTED_MESSAGE_CONTENT]", "level": "INFO", "name": "[MODULE_PLACEHOLDER]", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:41.562324+00:00"}
2025-08-05 03:28:41.902 - [MODULE_PLACEHOLDER] - INFO - Participant agent-[AGENT_ID] attributes changed: {'lk.agent.state': 'speaking'}
{"message": "Participant agent-[AGENT_ID] attributes changed: {'lk.agent.state': 'speaking'}", "level": "INFO", "name": "[MODULE_PLACEHOLDER]", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:41.902925+00:00"}
2025-08-05 03:28:44.497 - asyncio - ERROR - Task exception was never retrieved
future: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at [PATH]/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')>
Traceback (most recent call last):
File "[PATH]/utils/log.py", line 16, in async_fn_logs
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "[PATH]/voice/agent_activity.py", line 1752, in _realtime_reply_task
generation_ev = await self._rt_session.generate_reply(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[MODULE].RealtimeError: generate_reply timed out waiting for generation_created event.
{"message": "Task exception was never retrieved\nfuture: <Task finished name='AgentActivity.realtime_reply' coro=<AgentActivity._realtime_reply_task() done, defined at [PATH]/utils/log.py:13> exception=RealtimeError('generate_reply timed out waiting for generation_created event.')>\nTraceback (most recent call last):\n File \"[PATH]/utils/log.py\", line 16, in async_fn_logs\n return await fn(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"[PATH]/voice/agent_activity.py\", line 1752, in _realtime_reply_task\n generation_ev = await self._rt_session.generate_reply(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n[MODULE].RealtimeError: generate_reply timed out waiting for generation_created event.", "level": "ERROR", "name": "asyncio", "pid": [PID_PLACEHOLDER], "job_id": "[JOB_ID]", "timestamp": "2025-08-05T03:28:44.497090+00:00"}
And here is a schematic view of the function call:
@function_tool()
async def handle_interaction(self):
send_task = asyncio.create_task(self.llm_instance.send_text_to_backend(self.user_input))
await context.session.generate_reply(instructions=f"Say: {self.stall_message}")
if self.user_end_message:
return None
response = await send_task
return response
@dvirginz okay I see, gemini realtime model doesn't support generate_reply in a function call, it's working in BLOCKING mode right now that will wait for a tool output before generate next response.
related to https://github.com/livekit/agents/issues/2367
Ok, I see. That was the original request by the issue opener, right?
Any idea how we can implement similar logic? Like generating an “I’m on it” response while we fetch the answer?
Very interested in this functionality with Gemini live as well -- @longcw can you provide more context on your note """ it's working in BLOCKING mode right now that will wait for a tool output before generate next response."""? Is there a way to simulate/fake tool output?
I notice that await self.session.generate_reply() seems to work sometimes in function calls, but fails intermittently with this error.
Ok, I see. That was the original request by the issue opener, right?
yes I think so.
Any idea how we can implement similar logic? Like generating an “I’m on it” response while we fetch the answer?
It needs the NON_BLOCKING mode in gemini tool call https://ai.google.dev/gemini-api/docs/live-tools#async-function-calling which is not supported right now, but I'll take a look to see if we can add it in next version.
@longcw Thank you!
@longcw Another flow where this happens is when you generate a reply and then call another generate reply. The same error will happen.
For example: • In the on_enter, generate a long welcome message • Then define a logic for DTMF that triggers another generate_reply
The same error will be raised.
This doesn’t happen in OAI Realtime.
@longcw This issue happens with Azure OpenAI realtime as well.
+1 for gemini. Same issue
Same problem here
gemini live pls fix b0ss
Any update on this?
we will expose tool_behavior and tool_response_scheduling for gemini realtime API in next release https://github.com/livekit/agents/pull/3482. then you can set tool_behavior="NON_BLOCKING" so the model won't wait for a tool output before a new generation, and you can call generate_reply in a function call.
Great solution thank you for the update @longcw
It happens to me as well, for Gemini live model gemini-2.5-flash-native-audio-preview-09-2025, with a simple code
async def on_enter(self):
self.session.generate_reply(
instructions="Greet the user kindly."
)
no other generate_reply functions anywhere. I use 1.2.12 LiveKit Agents with NON_BLOCKING
session = AgentSession(
turn_detection=MultilingualModel(),
preemptive_generation=True,
llm=google.beta.realtime.RealtimeModel(
model="gemini-2.5-flash-native-audio-preview-09-2025",
tool_behavior=types.Behavior.NON_BLOCKING,
temperature=0.0,
),
userdata=caller,
)
Any ideas?
Cheers!
Ok, funny thing was that I had word greet user in my instructions (it doesn't matter if that's on system or the local one for reply). When I switched that to introduce yourself then it started to work correctly.
Gemini has its own ways I guess.
@longcw Will the above fix address the same issue for Azure OpenAI realtime models as well?
I have the same issue, but when I use azure openai realtime. Using openai realtime without azure seems to not produce this though
@paul-vinogradov Were you able to find a solution for AzureOpenAI Realtime? I am facing the same issue as well and just for this, I have to resort to using Azure Speech Service as TTS which is incuring extra costs