cookbook thinking_budget=0 does not work,

Description of the bug:

response.usage_metadata.thoughts_token_count

I get none 0 for thought token,

# now len(prompt) > over 100k
response = client.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents=[prompt],
    config=genai.types.GenerateContentConfig(
        thinking_config=genai.types.ThinkingConfig(thinking_budget=0)
    ),
)
print(
    f"Token usage: prompt={response.usage_metadata.prompt_token_count}, "
    f"candidates={response.usage_metadata.candidates_token_count}, "
    f"thoughts={response.usage_metadata.thoughts_token_count}"
)

I get Token usage: prompt=100858, candidates=4076, thoughts=1515

Actual vs expected behavior:

Token usage: prompt=100858, candidates=4076, thoughts=0

Any other information you'd like to share?

This works for shorter prompt but for longer prompt I get none 0 thought tokens, even though i set it as budget 0

Apr 18 '25 04:04 miroblog

Hello @miroblog, I'm not able to reproduce the issue, even with context longer than yours. Can you share with me the long context you are using?

Apr 18 '25 15:04 Giom-V

similar issue raised https://discuss.ai.google.dev/t/gemini-2-5-flash-preview-04-17-not-honoring-thinking-budget-0/80165/3

Apr 20 '25 13:04 miroblog

Hey @miroblog , I tried it with a larger context (around 200k), and now it's working as expected. The issue seems to be fixed.

Let me know if you're still facing the issue. Thanks.

Apr 22 '25 05:04 Gunand3043

This is not fixed for me, I find it's highly prompt dependent though. Some prompts will run fine with no thinking 10/10 times, others will use thinking tokens at least 50% of the time.

Apr 22 '25 11:04 kyleholgate

@miroblog The team confirms there's a known issue where Gemini sometimes still thinks a bit even when told not to. I'll keep the thread up-to-date when I'll get some updates.

Apr 22 '25 11:04 Giom-V

is there any update to this issue ?

Apr 25 '25 20:04 X901

I have the same issue. Is there an update for this yet? @Giom-V

Apr 28 '25 06:04 vaidy12345

I think this issue fixed in last model update gemini-2.5-flash-preview-05-20

May 21 '25 15:05 X901

That's awesome, thank you so much :)

On Wed, 21 May 2025 at 20:57, Basel Baragabah @.***> wrote:

X901 left a comment (google-gemini/cookbook#722) https://github.com/google-gemini/cookbook/issues/722#issuecomment-2898377762

I think this issue fixed in last model update gemini-2.5-flash-preview-05-20

— Reply to this email directly, view it on GitHub https://github.com/google-gemini/cookbook/issues/722#issuecomment-2898377762, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7WYHGCXHVJZBULMOIQRXXL27SLPZAVCNFSM6AAAAAB3MKLQNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQOJYGM3TONZWGI . You are receiving this because you commented.Message ID: @.***>

May 21 '25 15:05 vaidy12345

Issue seems to persist with the new model as well. Even though I set thinking_budget=0, the model still uses thinking tokens in most of my test cases.

May 21 '25 18:05 kanzyai-emirarditi

Just checking in to see if there are any updates? I’m still experiencing the issue even with thinking_budget=0. Would be great to know if a fix is in progress or if there’s a recommended workaround. Thanks!

Jun 26 '25 13:06 cspiecker

Just checking in to see if there are any updates? I’m still experiencing the issue even with thinking_budget=0. Would be great to know if a fix is in progress or if there’s a recommended workaround. Thanks!

they release the final version few days ago check it out

Jun 26 '25 13:06 X901

woking on 2.5-flash final version. With reasoning_effort="none", there is still a small chance 2.5 flash will think, maybe caused by a custom thinking key in my result json structure. Now I'm testing thinking_budget.

usage=CompletionUsage(completion_tokens=88, prompt_tokens=1238, total_tokens=2176, completion_tokens_details=None, prompt_tokens_details=None)

Jul 01 '25 09:07 mammothrider

woking on 2.5-flash final version. With reasoning_effort="none", there is still a small chance 2.5 flash will think, maybe caused by a custom thinking key in my result json structure. Now I'm testing thinking_budget.
usage=CompletionUsage(completion_tokens=88, prompt_tokens=1238, total_tokens=2176, completion_tokens_details=None, prompt_tokens_details=None)

In the response if you see thoughts with some numbers It mean it was thinking but if it was thoughts=0 it was not thinking

Jul 01 '25 09:07 X901

woking on 2.5-flash final version. With reasoning_effort="none", there is still a small chance 2.5 flash will think, maybe caused by a custom thinking key in my result json structure. Now I'm testing thinking_budget.
usage=CompletionUsage(completion_tokens=88, prompt_tokens=1238, total_tokens=2176, completion_tokens_details=None, prompt_tokens_details=None)
In the response if you see thoughts with some numbers It mean it was thinking but if it was thoughts=0 it was not thinking

I don't catch thinking or thoughts in my api call response, but some of my requests gave me Could not parse response content as the length limit was reached error. I set max_tokens to 1024, and normally my results are around 100 tokens. And all these error logs have a total_tokens more than 2k tokens. After some search, total_tokens = prompt_tokens + output_token, and output_token = thinking_tokens + completion_tokens, so I can only guess for some requests, gemini is doing some thinking.

Edit: I test with thinking_budget=0, and the problem still exists.

extra_body = "extra_body": {
                "google": {
                    "thinking_config": {
                        "thinking_budget": 0,
                        "include_thoughts": true  # for debug purpose
                    }
                }
            }

and about 10% of the request will think. One result completion after model dump:

{
    "asctime": "2025-07-04 14:06:11",
    "severity": "DEBUG",
    "name": "services.openai_api",
    "module": "openai_api",
    "funcName": "get_completion_with_formatter",
    "lineno": 222,
    "correlation_id": "-",
    "message": "",
    "id": "",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "<thought>***</thought>{\n  \"analysis\": \"***\",\n  \"user_output\": \"***\"\n}",
                "refusal": null,
                "role": "assistant",
                "audio": null,
                "function_call": null,
                "tool_calls": [

                ],
                "parsed": {
                    "analysis": "***",
                    "user_output": "***"
                },
                "extra_content": {
                    "google": {
                        "thought": true
                    }
                }
            }
        }
    ],
    "created": 1751609171,
    "model": "gemini-2.5-flash",
    "object": "chat.completion",
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 87,
        "prompt_tokens": 1238,
        "total_tokens": 2164,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    }
}

The extra_content part does not exist in my normal requests. And I'm considering if this is some kind of a service level bug.

Jul 01 '25 09:07 mammothrider

This issue is occurring with gemini-2.5-flash-preview-09-2025. It does not respect the thinking_budget. Setting the budget to 0 has no impact, and even setting it to a definite amount e.g. 5k, results in more than 5k thinking tokens.

The problem only occurs when setting response_mime_type="application/json". But removing this is not option, since we need structured output.

Sep 25 '25 19:09 stri8ed

Agreed. It causes inconsistent outputs. gemini-2.5-flash-preview-09-2025 doesn't respect thinking_budget. In Google Studio, it works fine.

This issue is occurring with gemini-2.5-flash-preview-09-2025. It does not respect the thinking_budget. Setting the budget to 0 has no impact, and even setting it to a definite amount e.g. 5k, results in more than 5k thinking tokens.

Nov 06 '25 01:11 sirusbaladi

Confirmed, seeing the same issue. Any update @Giom-V ?

Nov 13 '25 18:11 marcwestermann

This is still being worked on but we should have a solution soon I hope.

Nov 13 '25 22:11 Giom-V

cookbook cookbook copied to clipboard

thinking_budget=0 does not work,

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

cookbook
cookbook copied to clipboard