Properly handle Gemini thought summaries and export them via OpenTelemetry in Google LLM plugin

Open giovaborgogno opened this issue 1 month ago • 4 comments

Bug Description

Gemini’s API provides thought summaries when include_thoughts=true is enabled. These appear in content.parts with the part.thought flag set and must be handled separately from normal answer text. https://ai.google.dev/gemini-api/docs/thinking#summaries

Right now, the Google LLM plugin in agents/livekit-plugins/livekit-plugins-google/livekit/plugins/google /llm.py ignores part.thought entirely. The parsing code only looks at part.function_call and part.text:

def _parse_part(self, id: str, part: types.Part) -> llm.ChatChunk | None:
    if part.function_call:
        chat_chunk = llm.ChatChunk(
            id=id,
            delta=llm.ChoiceDelta(
                role="assistant",
                tool_calls=[
                    llm.FunctionToolCall(
                        arguments=json.dumps(part.function_call.args),
                        name=part.function_call.name,
                        call_id=part.function_call.id or utils.shortuuid("function_call_"),
                    )
                ],
                content=part.text,
            ),
        )
        return chat_chunk

    return llm.ChatChunk(
        id=id,
        delta=llm.ChoiceDelta(content=part.text, role="assistant"),
    )

There’s no check for part.thought, so thought summaries are treated as regular assistant output.

Problems

Bug: Thoughts are spoken by TTS When include_thoughts=True, thought summaries are merged into the same content used for user-facing responses. The TTS layer receives them and reads the agent’s internal reasoning out loud, which is not what Gemini’s “thinking” feature is meant for.
Missing observability of thoughts in OTEL LiveKit already uses OpenTelemetry, but the Gemini thought summaries are not surfaced there at all. There is no way to inspect the model’s internal reasoning in traces/logs while keeping it hidden from the end user and TTS.

Expected Behavior

Separation of thoughts vs. answer Parts with part.thought == True should not be included in the assistant’s user-visible content. Thought parts should never be forwarded to TTS or any channel that is meant for end-user output.

Observability via existing OpenTelemetry Thought summaries should be attached to the existing OTEL spans/traces for Gemini calls

Reproduction Steps

e.g.:


session = AgentSession(
        stt="assemblyai/universal-streaming-multilingual",
        llm=google.LLM(
            model="gemini-2.5-flash-preview-09-2025",
            temperature=0.8,
            thinking_config=types.ThinkingConfig(
                include_thoughts=True,
                thinking_budget=1500,
            ),
        ),
        tts=elevenlabs.TTS(
            voice_id=elevenlabs_voice_id,
            model=elevenlabs_model,
            language=tts_language,
        ),
    )

Operating System

macOS Tahoe

Models Used

assemblyai, google plugin, eleven labs

Package Versions

# Core LiveKit dependencies
livekit>=1.0.13
livekit-agents[images,elevenlabs]>=1.3.6
livekit-api>=1.0.5
livekit-protocol>=1.0.6

# LiveKit plugins
livekit-plugins-google>=1.3.6

# Google Gemini API
google-generativeai==0.8.3

# Telemetry (Langfuse/OpenTelemetry/Judgment Labs)
opentelemetry-api>=1.39.0
opentelemetry-sdk>=1.39.0
opentelemetry-exporter-otlp-proto-http>=1.39.0
judgeval>=0.1.0  # Judgment Labs tracing (OpenTelemetry compatible)

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Dec 06 '25 16:12 giovaborgogno