ThinkingBudget not respected
Sometime the 'ThinkingBudget' (limit) is not respected:
AI model : models/gemini-2.5-pro-preview-05-06
Generated : Monday, 02-Jun-25 09:13:54 CEST
Processing : 128.5 secs for 1 candidate
Tokens : 395939 (Thoughts: 355987, Prompt: 38651, Candidates: 1301)
Note: Thoughts: 355987- Thinking was limited to '8000' tokens.
The request to the API (v1.8.0) shows this (printed with spew):
ThinkingConfig: (*genai.ThinkingConfig)(0x140003fc350)({
IncludeThoughts: (bool) true,
ThinkingBudget: (*int32)(0x101ebb384)(8000)
})
The response from the API (v1.8.0) shows this (printed with spew):
ModelVersion: (string) (len=35) "models/gemini-2.5-pro-preview-05-06",
PromptFeedback: (*genai.GenerateContentResponsePromptFeedback)(<nil>),
UsageMetadata: (*genai.GenerateContentResponseUsageMetadata)(0x140001d42d0)({
CacheTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
CachedContentTokenCount: (int32) 0,
CandidatesTokenCount: (int32) 1301,
CandidatesTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
PromptTokenCount: (int32) 38651,
PromptTokensDetails: ([]*genai.ModalityTokenCount) (len=1 cap=1) {
(*genai.ModalityTokenCount)(0x140004b40c0)({
Modality: (genai.MediaModality) (len=4) "TEXT",
TokenCount: (int32) 38651
})
},
ThoughtsTokenCount: (int32) 355987,
ToolUsePromptTokenCount: (int32) 0,
ToolUsePromptTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
TotalTokenCount: (int32) 395939,
TrafficType: (genai.TrafficType) ""
})
Unclear if this is a defect of the API or a defect of the underlaying AI model.
If helpful a full spew dump (request + response) is available.
BTW: The defect leads to unwanted high costs.
@Klaus-Tockloth Could you provide full request so that I can rephrase it using curl command and routing this issue to backend team. Thanks
Following is the request logged with "spew" and the subsequent response:
Recreating the situation might be easier with this short prompt (in German):
Ergänze die Webseite "api.html" um die Beschreibung für den REST-Aufruf 'ContoursRequest' mit der Antwort 'ContoursResponse'.
The following files were part of the prompt:
/Users/klaustockloth/hoehendaten-dev.de/api.html (20250602-083208, 26.5 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/dud.html (20250602-065930, 10.3 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/gpx.html (20250602-065937, 17.3 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/hoehenlinien.html (20250601-105725, 21.5 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/impressum.html (20250602-065947, 8.6 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/index.html (20250602-065955, 15.6 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/karte.html (20250602-070004, 3.2 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/punkt_utm.html (20250602-070011, 12.3 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/punkt.html (20250602-070018, 13.6 KiB, text/html)
/Users/klaustockloth/hoehendaten-dev.de/style.css (20250529-180000, 5.5 KiB, text/plain)
/Users/klaustockloth/go/src/klaus/elevation/dtm-elevation-service-dev/common.go (20250601-085853, 17.8 KiB, text/plain)
Note: The added files may have changed slightly in the meantime. Please compare them with the dumped files contained in the 'spew' log.
BTW: I would describe the response (after 395939 tokens of thinking) as excellent.
Hi @Klaus-Tockloth Could you try it using the newest gemini 2.5 pro model: "gemini-2.5-pro-preview-06-05" to see if the issue still exists.
Thanks!
The issue also occurs when I use the newest gemini 2.5 pro model "gemini-2.5-pro-preview-06-05":
ThinkingBudget: (*int32)(0x101a3b384)(8000) ThoughtsTokenCount: (int32) 21360
My expectation is that 'ThoughtsTokenCount' can't be larger than 'ThinkingBudget'.
AI model : gemini-2.5-pro-preview-06-05
Generated : Wednesday, 11-Jun-25 07:02:01 CEST
Processing : 64.6 secs for 1 candidate
Tokens : 32417 (Thoughts: 21360, Prompt: 5322, Candidates: 5735)
ThinkingConfig: (*genai.ThinkingConfig)(0x14000044360)({
IncludeThoughts: (bool) true,
ThinkingBudget: (*int32)(0x101a3b384)(8000)
})
...
UsageMetadata: (*genai.GenerateContentResponseUsageMetadata)(0x140001425a0)({
CacheTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
CachedContentTokenCount: (int32) 0,
CandidatesTokenCount: (int32) 5735,
CandidatesTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
PromptTokenCount: (int32) 5322,
PromptTokensDetails: ([]*genai.ModalityTokenCount) (len=1 cap=1) {
(*genai.ModalityTokenCount)(0x1400013f038)({
Modality: (genai.MediaModality) (len=4) "TEXT",
TokenCount: (int32) 5322
})
},
ThoughtsTokenCount: (int32) 21360,
ToolUsePromptTokenCount: (int32) 0,
ToolUsePromptTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
TotalTokenCount: (int32) 32417,
TrafficType: (genai.TrafficType) ""
})
Second example (with gemini-2.5-pro-preview-06-05):
ThinkingBudget: (*int32)(0x101a27384)(8000) ThoughtsTokenCount: (int32) 244462
AI model : gemini-2.5-pro-preview-06-05
Generated : Wednesday, 11-Jun-25 17:00:52 CEST
Processing : 225.3 secs for 1 candidate
Tokens : 289072 (Thoughts: 244462, Prompt: 23865, Candidates: 20745)
ThinkingConfig: (*genai.ThinkingConfig)(0x14000404350)({
IncludeThoughts: (bool) true,
ThinkingBudget: (*int32)(0x101a27384)(8000)
})
...
UsageMetadata: (*genai.GenerateContentResponseUsageMetadata)(0x140000181b0)({
CacheTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
CachedContentTokenCount: (int32) 0,
CandidatesTokenCount: (int32) 20745,
CandidatesTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
PromptTokenCount: (int32) 23865,
PromptTokensDetails: ([]*genai.ModalityTokenCount) (len=1 cap=1) {
(*genai.ModalityTokenCount)(0x1400036aa68)({
Modality: (genai.MediaModality) (len=4) "TEXT",
TokenCount: (int32) 23865
})
},
ThoughtsTokenCount: (int32) 244462,
ToolUsePromptTokenCount: (int32) 0,
ToolUsePromptTokensDetails: ([]*genai.ModalityTokenCount) <nil>,
TotalTokenCount: (int32) 289072,
TrafficType: (genai.TrafficType) ""
})
Thanks for the details. I am asking the related team for clarification on ThinkingBudget. I haven't received a reply yet.
Is there anything new regarding this issue?
The problem also occurs with the production version of the "Gemini Pro 2.5" model. The new "front-runner" is this query, with over half a million tokens for thinking. This corresponds to costs of >5 dollars for the query.
AI model : gemini-2.5-pro
Generated : Sunday, 29-Jun-25 12:47:02 CEST
Processing : 138.1 secs for 1 candidate
Tokens : 589891 (Thoughts: 559260, Prompt: 22695, Candidates: 7936)