agents
agents copied to clipboard
Text input interruption prevents response generation in Gemini Live API
Bug Description
When using livekit-agents[google] v1.2.18 with Gemini Live API (2.5), if a user sends text input while the model is generating a response, it successfully interrupts the model but the model does not generate any response. The model remains stuck and will not respond to subsequent text inputs until audio input is received, which "unstucks" the model.
Expected Behavior
After interrupting a model response with text input, the model should:
- Stop the current generation
- Process the interrupting text input
- Generate a response to the new text input
Reproduction Steps
1. Configure a Gemini Live 2.5 model:
session = AgentSession(
llm=google.realtime.RealtimeModel(
model="gemini-live-2.5-flash",
modalities=[types.Modality.AUDIO],
api_version="v1")
)
3. Initiate a response from the model (either via audio or text input)
4. While the model is generating a response, interrupt it by sending user text input
5. **Observed behavior**: The model is interrupted but does not generate any response
6. Send additional text inputs
7. **Observed behavior**: Subsequent text inputs also do not receive responses
8. Send audio input
9. **Observed behavior**: Audio input "unstucks" the model and responses resume
Operating System
Any
Models Used
Gemini
Package Versions
├── livekit-agents[google] v1.2.18
│ ├── livekit v1.0.18
│ ├── livekit-api v1.0.7
│ │ ├── livekit-protocol v1.0.8
│ ├── livekit-blingfire v1.0.0
│ ├── livekit-protocol v1.0.8 (*)
│ └── livekit-plugins-google v1.2.18 (extra: google)
│ └── livekit-agents v1.2.18 (*)
├── livekit-api v1.0.7 (*)
├── livekit-plugins-noise-cancellation v0.2.5
│ └── livekit v1.0.18 (*)
Session/Room/Call IDs
No response
Proposed Solution
Additional Context
- Using Google Geni SDK works correctly, so the issue seems to be with livekit google pluging.
- Related code:
RealtimeSession.generate_reply()andRealtimeSession.update_chat_ctx()inrealtime_api.py
Screenshots and Recordings
No response
I'm experiencing the same with OpenAI Realtime API
"@livekit/agents": "^1.0.21",
"@livekit/agents-plugin-openai": "^1.0.21",
"@livekit/rtc-node": "^0.13.21",
Using JS SDK Like so
import {
useChat,
} from '@livekit/components-react'
...
const chat = useChat()
...
const send = useCallback(
async (message: string) => {
if (chat.isSending) {
return
}
return await chat.send(message)
},
[chat],
)
When sending a message while the agent is speaking, it gets interrupted but 50%-ish of the time it doesn't output a response.
Experimented with different turn detection types, null, semantic_vad, server_vad without success.
Also tried overriding the session's inputOptions to handle the incoming text, interrupting, then generating reply, but not luck either
await session.start({
agent,
room: ctx.room,
inputOptions: {
textInputCallback: async (sess: voice.AgentSession, ev: voice.TextInputEvent) => {
await sess.interrupt().await
sess.generateReply({ userInput: ev.text })
},
},
})
Any hints would be highly appreciated 🙏