opencode icon indicating copy to clipboard operation
opencode copied to clipboard

Response terminates prematurely when using Gemini 3 via LiteLLM

Open themw123 opened this issue 2 months ago • 11 comments

Description

When using gemini-3-flash-preview through a LiteLLM proxy, OpenCode stops processing the response as soon as the model triggers a tool call. The model provides a reasoning block and a tool call, but OpenCode does not seem to execute the requested tool (e.g., read) and the interaction hangs or terminates without output.

The model response from LiteLLM looks like this (shortened for clarity):

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_023b94f976e14e3a8711dd9c9864",
            "type": "function",
            "function": {
              "name": "read",
              "arguments": "{\"filePath\": \"/path/to/file/test.txt\"}"
            },
            "provider_specific_fields": {
              "thought_signature": "..." 
            }
          }
        ],
        "reasoning_content": "**Synthesizing Knowledge Bases**\n\nI've been analyzing..."
      },
      "finish_reason": "stop"
    }
  ]
}

The issue does NOT occur when connecting Gemini 3 directly to OpenCode (without LiteLLM)

OpenCode version

1.0.203

LiteLLM Version

1.80.11

Steps to reproduce

  1. Set up LiteLLM with a Gemini 3 model - for example here is my litellm config:
model_list:
  - model_name: gemini/gemini-3-flash-preview
    litellm_params:
      model: gemini/gemini-3-flash-preview
      api_key: xxx
      drop_params: true
  1. Configure OpenCode to use the LiteLLM endpoint:
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "litellm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "litellm",
      "options": {
        "baseURL": "https://localhost:4000/v1",
        "apiKey": "sk-xxx"
      },
      "models": {
        "gemini/gemini-3-flash-preview": {
          "name": "gemini/gemini-3-flash-preview",
          "options": {
            "reasoningEffort": "high"
          }
        }
      }
    }
  }
}
  1. Ask a question that requires reading a file or using a tool.

Observe that the process stops and no file is read, despite the model requesting it in the JSON.

Screenshot and/or share link

No response

Operating System

Windows 11 with WSL 2

Terminal

No response

themw123 avatar Dec 27 '25 12:12 themw123

This issue might be a duplicate of existing issues. Please check:

  • #3365: Opencode with litellm just stops before finishing the task (closed - similar symptom with LiteLLM proxy causing OpenCode to stop mid-task)
  • #4832: [BUG]: Gemini 3 Pro function calling fails - missing thoughtSignature support (Gemini 3 tool calling compatibility issue)
  • #3596: SSE Stream Bug: Out-of-Order thinking_delta via LiteLLM → AWS Bedrock (LiteLLM proxy causing response handling issues)
  • #2915: LiteLLM error: Anthropic doesn't support tool calling without tools= param specified (LiteLLM proxy integration issues)

Feel free to ignore if none of these address your specific case.

github-actions[bot] avatar Dec 27 '25 12:12 github-actions[bot]

Do not use the @ai-sdk/openai-compatible provider with Gemini on LiteLLM. Use @ai-sdk/google instead

emerzon avatar Jan 10 '26 03:01 emerzon

@emerzon Thanks for the suggestion, but using @ai-sdk/google would bypass LiteLLM entirely and connect directly to Gemini.

The whole point here is to use LiteLLM as a proxy. LiteLLM exposes an OpenAI-compatible API regardless of the backend model, so @ai-sdk/openai-compatible is the correct choice.

The issue seems to be with how OpenCode handles Gemini 3's special response fields (reasoning_content, thought_signature) when passed through LiteLLM. This configuration works fine with Claude Code and Continue.dev, so it appears to be an OpenCode specific issue.

themw123 avatar Jan 10 '26 14:01 themw123

@themw123 This is not true, you can still set the base URL and it will go over LiteLLM, but use the Gemini request format without OpenAI format translation. I am using it exactly like that.

  "provider": {
    "litellm-google": {
      "npm": "@ai-sdk/google",
      "name": "LiteLLM Google",
      "options": {
        "baseURL": "https://litellm.instance"
      },
      "models": {
        "gemini-3-pro-high": {
          "id": "gemini-3-pro-preview",
          "name": "Gemini 3 Pro Preview (High Thinking)",
          "options": {
            "thinkingConfig": {
              "thinkingLevel": "high",
              "includeThoughts": true
            }
          }
        },

emerzon avatar Jan 10 '26 15:01 emerzon

i tried to use the passthrough config from emerzon but it did not work

Connecting to litellm proxy using openai compatible it works but it stops constantly is not usable

darwincrv avatar Jan 14 '26 02:01 darwincrv

i tried to use the passthrough config from emerzon but it did not work

Connecting to litellm proxy using openai compatible it works but it stops constantly is not usable

I have been using this config without any issues. Which issues did you had?

emerzon avatar Jan 14 '26 05:01 emerzon

the outcome of my tests indicate that gemini api is not supported for passthrough api so i tried using vertex: https://docs.litellm.ai/docs/pass_through/vertex_ai

so i configured litellm like this:

config.yaml

model_list:
  # Vertex AI
  - model_name: vertex_ai/*
    litellm_params:
      model: vertex_ai/*
      vertex_project: "xxxxxxxxxxxxxxxxxx"
      vertex_location: "global"
      vertex_credentials: os.environ/GOOGLE_APPLICATION_CREDENTIALS
      use_in_pass_through: true

router_settings:
  routing_strategy: simple-shuffle
  num_retries: 2
  timeout: 300
  retry_after: 10
  optional_pre_call_checks: ["responses_api_deployment_check"]

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  auto_update_model_cost_map: true
  store_prompts_in_spend_logs: true
  use_x_forwarded_for: true

litellm_settings:
  max_budget: 100
  budget_duration: 30d
  timezone: "Australia/Sydney"
  drop_params: True
  cache: True
  cache_params:
    type: redis
    host: localhost
    port: 6379
    url: redis://localhost:6379/0
  check_provider_endpoint: false
  num_retries: 3
  retry_after: 10
  request_timeout: 300
  allowed_fails: 2
  cooldown_time: 10

my opencode setting is similar to yours but i noticed you didnt define authentication and i am getting an authentication issue because my litellm proxy requires an API key, have you been able to get it to work if you require authentication in litellm proxy?

    "dr-google": {
      "npm": "@ai-sdk/google",
      "name": "LiteLLM Google (passthrough)",
      "options": {
        "baseURL": "http://192.168.1.212:4000/vertex_ai/",
        "apiKey": "sk-xyzabc",
        "headers": {
          "x-litellm-api-key": "Bearer sk-xyzabc"
        }
      },
      "models": {
        "vertex_ai/gemini-3-pro-preview": {
          "name": "Gemini 3 Pro (Native)",
          "limit": {
            "context": 1048576,
            "output": 65536
          },
          "cost": {
            "input": 2,
            "output": 12
          },
          "options": {
            "thinkingConfig": {
              "thinkingLevel": "high",
              "includeThoughts": true
            }
          }
        },

darwincrv avatar Jan 14 '26 14:01 darwincrv

I am using with Vertex, but I don't have use_in_pass_through: true in my config. I also have individual model entries for each model: Ie. gemini-3-pro-preview, gemini-3-flash-preview, etc.

My auth to vertex is handled via env vars: GOOGLE_APPLICATION_CREDENTIALS (pointing to the keyfile with credentails), VERTEX_PROJECT and VERTEX_LOCATION, but I suppose this wont matter much.

For the client authentication you should not set the credentials in the config file, you should use the /connect option later in the UI to provide the API key

emerzon avatar Jan 14 '26 17:01 emerzon

thanks @emerzon I've been running in circles trying to get that to work ! I was also using the open-ai compatible model. I don't seem to get cost reported on opencode though with this setup, even though my LiteLLM instance returns token usage - any idea how to fix that too?

abrouaux avatar Jan 14 '26 18:01 abrouaux

I don't seem to get cost reported on opencode though with this setup, even though my LiteLLM instance returns token usage - any idea how to fix that too?

I think the only way so far is to manually set costs in the model definition:

    "litellm-google": {
      "npm": "@ai-sdk/google",
      "name": "LiteLLM (Google)",
      "options": {
        "baseURL": "https://llm.instance"
      },
      "models": {
        "gemini-3-pro-preview": {
          "name": "Gemini 3 Pro Preview",
          "cost": {
            "input": 2,
            "output": 12,
            "cache_read": 0.2
          },
          "limit": {
            "context": 1000000,
            "output": 65536
          },
          "options": {
            "includeThoughts": true
          },
          "variants": {
            "high": {
              "options": {
                "thinkingConfig": {
                  "thinkingLevel": "high"
                }
              }
            },
            "low": {
              "options": {
                "thinkingConfig": {
                  "thinkingLevel": "low"
                }
              }
            },
          }

emerzon avatar Jan 14 '26 19:01 emerzon

thanks @emerzon that worked

darwincrv avatar Jan 15 '26 01:01 darwincrv

upgraded to newer version and now it seems to be fixed even with "npm": "@ai-sdk/openai-compatible"

themw123 avatar Jan 23 '26 09:01 themw123