go-genai ThinkingBudget not respected

Sometime the 'ThinkingBudget' (limit) is not respected:

AI model   : models/gemini-2.5-pro-preview-05-06
Generated  : Monday, 02-Jun-25 09:13:54 CEST
Processing : 128.5 secs for 1 candidate
Tokens     : 395939 (Thoughts: 355987, Prompt: 38651, Candidates: 1301)

Note: Thoughts: 355987- Thinking was limited to '8000' tokens.

The request to the API (v1.8.0) shows this (printed with spew):

 ThinkingConfig: (*genai.ThinkingConfig)(0x140003fc350)({
  IncludeThoughts: (bool) true,
  ThinkingBudget: (*int32)(0x101ebb384)(8000)
 })

The response from the API (v1.8.0) shows this (printed with spew):

 ModelVersion: (string) (len=35) "models/gemini-2.5-pro-preview-05-06",
 PromptFeedback: (*genai.GenerateContentResponsePromptFeedback)(<nil>),
 UsageMetadata: (*genai.GenerateContentResponseUsageMetadata)(0x140001d42d0)({
  CacheTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  CachedContentTokenCount: (int32) 0,
  CandidatesTokenCount: (int32) 1301,
  CandidatesTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  PromptTokenCount: (int32) 38651,
  PromptTokensDetails: ([]*genai.ModalityTokenCount) (len=1 cap=1) {
   (*genai.ModalityTokenCount)(0x140004b40c0)({
    Modality: (genai.MediaModality) (len=4) "TEXT",
    TokenCount: (int32) 38651
   })
  },
  ThoughtsTokenCount: (int32) 355987,
  ToolUsePromptTokenCount: (int32) 0,
  ToolUsePromptTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  TotalTokenCount: (int32) 395939,
  TrafficType: (genai.TrafficType) ""
 })

Unclear if this is a defect of the API or a defect of the underlaying AI model.

If helpful a full spew dump (request + response) is available.

BTW: The defect leads to unwanted high costs.

Jun 02 '25 08:06 Klaus-Tockloth

@Klaus-Tockloth Could you provide full request so that I can rephrase it using curl command and routing this issue to backend team. Thanks

Jun 02 '25 17:06 qiaodev

Following is the request logged with "spew" and the subsequent response:

issue-gemini.raw.zip

Recreating the situation might be easier with this short prompt (in German):

Ergänze die Webseite "api.html" um die Beschreibung für den REST-Aufruf 'ContoursRequest' mit der Antwort 'ContoursResponse'.

The following files were part of the prompt:

/Users/klaustockloth/hoehendaten-dev.de/api.html (20250602-083208, 26.5 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/dud.html (20250602-065930, 10.3 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/gpx.html (20250602-065937, 17.3 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/hoehenlinien.html (20250601-105725, 21.5 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/impressum.html (20250602-065947, 8.6 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/index.html (20250602-065955, 15.6 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/karte.html (20250602-070004, 3.2 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/punkt_utm.html (20250602-070011, 12.3 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/punkt.html (20250602-070018, 13.6 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/style.css (20250529-180000, 5.5 KiB, text/plain)
/Users/klaustockloth/go/src/klaus/elevation/dtm-elevation-service-dev/common.go (20250601-085853, 17.8 KiB, text/plain)

Note: The added files may have changed slightly in the meantime. Please compare them with the dumped files contained in the 'spew' log.

gemini-issue.zip

BTW: I would describe the response (after 395939 tokens of thinking) as excellent.

Jun 03 '25 15:06 Klaus-Tockloth

Hi @Klaus-Tockloth Could you try it using the newest gemini 2.5 pro model: "gemini-2.5-pro-preview-06-05" to see if the issue still exists.

Thanks!

Jun 10 '25 19:06 qiaodev

The issue also occurs when I use the newest gemini 2.5 pro model "gemini-2.5-pro-preview-06-05":

ThinkingBudget: (*int32)(0x101a3b384)(8000) ThoughtsTokenCount: (int32) 21360

My expectation is that 'ThoughtsTokenCount' can't be larger than 'ThinkingBudget'.

AI model   : gemini-2.5-pro-preview-06-05
Generated  : Wednesday, 11-Jun-25 07:02:01 CEST
Processing : 64.6 secs for 1 candidate
Tokens     : 32417 (Thoughts: 21360, Prompt: 5322, Candidates: 5735)

ThinkingConfig: (*genai.ThinkingConfig)(0x14000044360)({
  IncludeThoughts: (bool) true,
  ThinkingBudget: (*int32)(0x101a3b384)(8000)
 })

...

UsageMetadata: (*genai.GenerateContentResponseUsageMetadata)(0x140001425a0)({
  CacheTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  CachedContentTokenCount: (int32) 0,
  CandidatesTokenCount: (int32) 5735,
  CandidatesTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  PromptTokenCount: (int32) 5322,
  PromptTokensDetails: ([]*genai.ModalityTokenCount) (len=1 cap=1) {
   (*genai.ModalityTokenCount)(0x1400013f038)({
    Modality: (genai.MediaModality) (len=4) "TEXT",
    TokenCount: (int32) 5322
   })
  },
  ThoughtsTokenCount: (int32) 21360,
  ToolUsePromptTokenCount: (int32) 0,
  ToolUsePromptTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  TotalTokenCount: (int32) 32417,
  TrafficType: (genai.TrafficType) ""
})

Jun 11 '25 05:06 Klaus-Tockloth

Second example (with gemini-2.5-pro-preview-06-05):

ThinkingBudget: (*int32)(0x101a27384)(8000) ThoughtsTokenCount: (int32) 244462

AI model   : gemini-2.5-pro-preview-06-05
Generated  : Wednesday, 11-Jun-25 17:00:52 CEST
Processing : 225.3 secs for 1 candidate
Tokens     : 289072 (Thoughts: 244462, Prompt: 23865, Candidates: 20745)

ThinkingConfig: (*genai.ThinkingConfig)(0x14000404350)({
  IncludeThoughts: (bool) true,
  ThinkingBudget: (*int32)(0x101a27384)(8000)
})

...

UsageMetadata: (*genai.GenerateContentResponseUsageMetadata)(0x140000181b0)({
  CacheTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  CachedContentTokenCount: (int32) 0,
  CandidatesTokenCount: (int32) 20745,
  CandidatesTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  PromptTokenCount: (int32) 23865,
  PromptTokensDetails: ([]*genai.ModalityTokenCount) (len=1 cap=1) {
   (*genai.ModalityTokenCount)(0x1400036aa68)({
    Modality: (genai.MediaModality) (len=4) "TEXT",
    TokenCount: (int32) 23865
   })
  },
  ThoughtsTokenCount: (int32) 244462,
  ToolUsePromptTokenCount: (int32) 0,
  ToolUsePromptTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
  TotalTokenCount: (int32) 289072,
  TrafficType: (genai.TrafficType) ""
})

Jun 11 '25 15:06 Klaus-Tockloth

Thanks for the details. I am asking the related team for clarification on ThinkingBudget. I haven't received a reply yet.

Jun 11 '25 18:06 qiaodev

Is there anything new regarding this issue?

The problem also occurs with the production version of the "Gemini Pro 2.5" model. The new "front-runner" is this query, with over half a million tokens for thinking. This corresponds to costs of >5 dollars for the query.

AI model   : gemini-2.5-pro
Generated  : Sunday, 29-Jun-25 12:47:02 CEST
Processing : 138.1 secs for 1 candidate
Tokens     : 589891 (Thoughts: 559260, Prompt: 22695, Candidates: 7936)

Jun 30 '25 08:06 Klaus-Tockloth