Text input interruption prevents response generation in Gemini Live API

Open frankfarzan opened this issue 2 months ago • 1 comments

Bug Description

When using livekit-agents[google] v1.2.18 with Gemini Live API (2.5), if a user sends text input while the model is generating a response, it successfully interrupts the model but the model does not generate any response. The model remains stuck and will not respond to subsequent text inputs until audio input is received, which "unstucks" the model.

Expected Behavior

After interrupting a model response with text input, the model should:

Stop the current generation
Process the interrupting text input
Generate a response to the new text input

Reproduction Steps

1. Configure a Gemini Live 2.5 model:


session = AgentSession(
    llm=google.realtime.RealtimeModel(
        model="gemini-live-2.5-flash",
        modalities=[types.Modality.AUDIO],
        api_version="v1")
)


3. Initiate a response from the model (either via audio or text input)
4. While the model is generating a response, interrupt it by sending user text input
5. **Observed behavior**: The model is interrupted but does not generate any response
6. Send additional text inputs
7. **Observed behavior**: Subsequent text inputs also do not receive responses
8. Send audio input
9. **Observed behavior**: Audio input "unstucks" the model and responses resume

Operating System

Any

Models Used

Gemini

Package Versions

├── livekit-agents[google] v1.2.18
│   ├── livekit v1.0.18
│   ├── livekit-api v1.0.7
│   │   ├── livekit-protocol v1.0.8
│   ├── livekit-blingfire v1.0.0
│   ├── livekit-protocol v1.0.8 (*)
│   └── livekit-plugins-google v1.2.18 (extra: google)
│       └── livekit-agents v1.2.18 (*)
├── livekit-api v1.0.7 (*)
├── livekit-plugins-noise-cancellation v0.2.5
│   └── livekit v1.0.18 (*)

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

Using Google Geni SDK works correctly, so the issue seems to be with livekit google pluging.
Related code: RealtimeSession.generate_reply() and RealtimeSession.update_chat_ctx() in realtime_api.py

Screenshots and Recordings

No response

Nov 12 '25 23:11 frankfarzan

I'm experiencing the same with OpenAI Realtime API

"@livekit/agents": "^1.0.21",
"@livekit/agents-plugin-openai": "^1.0.21",
"@livekit/rtc-node": "^0.13.21",

Using JS SDK Like so

import {
  useChat,
} from '@livekit/components-react'
...
const chat = useChat()
...
const send = useCallback(
    async (message: string) => {
      if (chat.isSending) {
        return
      }
      return await chat.send(message)
    },
    [chat],
)

When sending a message while the agent is speaking, it gets interrupted but 50%-ish of the time it doesn't output a response.

Experimented with different turn detection types, null, semantic_vad, server_vad without success. Also tried overriding the session's inputOptions to handle the incoming text, interrupting, then generating reply, but not luck either

await session.start({
      agent,
      room: ctx.room,
      inputOptions: {
        textInputCallback: async (sess: voice.AgentSession, ev: voice.TextInputEvent) => {
          await sess.interrupt().await
          sess.generateReply({ userInput: ev.text })
        },
      },
    })

Any hints would be highly appreciated 🙏

Nov 26 '25 01:11 harlandjp