ai icon indicating copy to clipboard operation
ai copied to clipboard

claude sonnet 4 extended thinking requires thinking block on final message

Open rbitar opened this issue 4 months ago • 20 comments

Description

Issue: Error is thrown when using anthropic "claude-4-sonnet-20250514" model on the last assistant message. The last assistant message requires a thinking block, according to the anthropic spec.

Fix: When using extended thinking with tool use, thinking blocks must be explicitly preserved and returned with the tool results: source: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#example-working-with-redacted-thinking-blocks

Error message:

Expected `thinking` or `redacted_thinking`, but found `tool_use`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking

Example: Here’s an example showing thinking blocks:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

AI SDK Version

  • ai: 5.0.0
  • @ai-sdk/anthropic: 2.0.0
  • @ai-sdk/react: 2.0.0

rbitar avatar Aug 03 '25 14:08 rbitar

@rbitar can you give me a code snippet reproducing the error?

dancer avatar Aug 04 '25 15:08 dancer

@dancer Here is sample code in the chat/route that is used to enable thinking in anthropic claude sonnet 4, which will occasionally (but not consistently) trigger the error.

// File: api/chat/route.tsx
import {
  streamText,
  generateId,
  stepCountIs,
  convertToModelMessages,
} from "ai";
import { anthropic } from "@ai-sdk/anthropic";

.. 

 const result = await streamText({
      model: anthropic("claude-4-sonnet-20250514");    ,
      messages: convertToModelMessages(messages),
      maxOutputTokens: 640000,
      stopWhen: stepCountIs(100),
      tools: {
        web_search: webSearch,
         ... // add tools here for testing
      },
      prepareStep: async ({ stepNumber, steps, messages }) => {
        // Compress conversation history for longer loops
        if (messages.length > 20) {
          return {
            messages: messages.slice(-10),
          };
        }
        return {};
      },
      providerOptions: {
        anthropic: {
          headers: {
            "anthropic-beta": "output-128k-2025-02-19",
          },
          thinking: {
            type: "enabled",
            budgetTokens: 1024,
          },
        },
      },
    });

    return result.toUIMessageStreamResponse({
      sendReasoning: true
    });

rbitar avatar Aug 05 '25 18:08 rbitar

I ran into this issue as well. For me, it was not related upgrading from a v5 beta version to the v5 release version. I've tried it both with the following version sets:

  • ai: 5.0.4
  • @ai-sdk/anthropic: 2.0.1
  • @ai-sdk/react: 2.0.4

AND

  • ai: 5.0.0-beta.28
  • @ai-sdk/anthropic: 2.0.0-beta.9
  • @ai-sdk/react: 2.0.0-beta.28

In my case, I've introduced the same issue by switching the experimental_transform in streamText from smoothStream({ chunking: 'word' }) to a custom implementation. I do not get the error when moving back to the smoothStream implementation provided by ai library. It is also working with the latest v5 version listed above.

Why I created a custom smoothStream implementation?

The smoothStream implementation from the ai package only smoothes part type text, not reasoning, see https://github.com/vercel/ai/issues/5784

I've tried to implement custom smoothStream implemented that handles both part types. Here is the implementation, that then caused the Anthropic issue above, in case it is of interest (based on ai implementation but also smoothing reasoning-delta with its own buffer:

import { InvalidArgumentError } from '@ai-sdk/provider';
import type { TextStreamPart, ToolSet } from 'ai';
import { delay as originalDelay } from '@/lib/delay';

const CHUNKING_REGEXPS = {
  word: /\S+\s+/m,
  line: /\n+/m,
};

/**
 * Detects the first chunk in a buffer.
 *
 * @param buffer - The buffer to detect the first chunk in.
 *
 * @returns The first detected chunk, or `undefined` if no chunk was detected.
 */
export type ChunkDetector = (buffer: string) => string | undefined | null;

/**
 * Smooths text streaming output.
 *
 * @param delayInMs - The delay in milliseconds between each chunk. Defaults to 10ms. Can be set to `null` to skip the delay.
 * @param chunking - Controls how the text is chunked for streaming. Use "word" to stream word by word (default), "line" to stream line by line, or provide a custom RegExp pattern for custom chunking.
 *
 * @returns A transform stream that smooths text streaming output.
 */
export function smoothStream<TOOLS extends ToolSet>({
  delayInMs = 10,
  chunking = 'word',
  _internal: { delay = originalDelay } = {},
}: {
  delayInMs?: number | null;
  chunking?: 'word' | 'line' | RegExp | ChunkDetector;
  /**
   * Internal. For test use only. May change without notice.
   */
  _internal?: {
    delay?: (delayInMs: number | null) => Promise<void>;
  };
} = {}): (options: {
  tools: TOOLS;
}) => TransformStream<TextStreamPart<TOOLS>, TextStreamPart<TOOLS>> {
  let detectChunk: ChunkDetector;

  if (typeof chunking === 'function') {
    detectChunk = (buffer) => {
      const match = chunking(buffer);

      if (match == null) {
        return null;
      }

      if (!match.length) {
        throw new Error(`Chunking function must return a non-empty string.`);
      }

      if (!buffer.startsWith(match)) {
        throw new Error(
          `Chunking function must return a match that is a prefix of the buffer. Received: "${match}" expected to start with "${buffer}"`
        );
      }

      return match;
    };
  } else {
    const chunkingRegex =
      typeof chunking === 'string' ? CHUNKING_REGEXPS[chunking] : chunking;

    if (chunkingRegex == null) {
      throw new InvalidArgumentError({
        argument: 'chunking',
        message: `Chunking must be "word" or "line" or a RegExp. Received: ${chunking}`,
      });
    }

    detectChunk = (buffer) => {
      const match = chunkingRegex.exec(buffer);

      if (!match) {
        return null;
      }

      return buffer.slice(0, match.index) + match?.[0];
    };
  }

  return () => {
    let textBuffer = '';
    let textId = '';
    let reasoningBuffer = '';
    let reasoningId = '';
    let controller: TransformStreamDefaultController<TextStreamPart<TOOLS>>;

    const flushBuffer = (
      buffer: string,
      id: string,
      type: 'text-delta' | 'reasoning-delta'
    ) => {
      if (buffer.length > 0) {
        controller.enqueue({ type, text: buffer, id });
      }
    };

    const processChunks = async (
      buffer: string,
      id: string,
      type: 'text-delta' | 'reasoning-delta'
    ) => {
      // biome-ignore lint/suspicious/noEvolvingTypes: copy from Vercel AI SDK
      // biome-ignore lint/suspicious/noImplicitAnyLet: copy from Vercel AI SDK
      let match;
      let remainingBuffer = buffer;

      // biome-ignore lint/suspicious/noAssignInExpressions: copy from Vercel AI SDK
      while ((match = detectChunk(remainingBuffer)) != null) {
        controller.enqueue({ type, text: match, id });
        remainingBuffer = remainingBuffer.slice(match.length);
        // biome-ignore lint/nursery/noAwaitInLoop: doing it explicitly for the smoothing effect
        await delay(delayInMs);
      }

      return remainingBuffer;
    };

    const handleTextDelta = async (chunk: {
      type: 'text-delta';
      text: string;
      id: string;
    }) => {
      if (reasoningBuffer.length > 0) {
        flushBuffer(reasoningBuffer, reasoningId, 'reasoning-delta');
        reasoningBuffer = '';
      }

      if (chunk.id !== textId && textBuffer.length > 0) {
        flushBuffer(textBuffer, textId, 'text-delta');
        textBuffer = '';
      }

      textBuffer += chunk.text;
      textId = chunk.id;
      textBuffer = await processChunks(textBuffer, textId, 'text-delta');
    };

    const handleReasoningDelta = async (chunk: {
      type: 'reasoning-delta';
      text: string;
      id: string;
    }) => {
      if (textBuffer.length > 0) {
        flushBuffer(textBuffer, textId, 'text-delta');
        textBuffer = '';
      }

      if (chunk.id !== reasoningId && reasoningBuffer.length > 0) {
        flushBuffer(reasoningBuffer, reasoningId, 'reasoning-delta');
        reasoningBuffer = '';
      }

      reasoningBuffer += chunk.text;
      reasoningId = chunk.id;
      reasoningBuffer = await processChunks(
        reasoningBuffer,
        reasoningId,
        'reasoning-delta'
      );
    };

    const handleOtherChunk = (chunk: TextStreamPart<TOOLS>) => {
      flushBuffer(textBuffer, textId, 'text-delta');
      textBuffer = '';
      flushBuffer(reasoningBuffer, reasoningId, 'reasoning-delta');
      reasoningBuffer = '';
      controller.enqueue(chunk);
    };

    return new TransformStream<TextStreamPart<TOOLS>, TextStreamPart<TOOLS>>({
      start(ctrl) {
        controller = ctrl;
      },
      async transform(chunk, ctrl) {
        controller = ctrl;

        if (chunk.type === 'text-delta') {
          await handleTextDelta(chunk);
        } else if (chunk.type === 'reasoning-delta') {
          await handleReasoningDelta(chunk);
        } else {
          handleOtherChunk(chunk);
        }
      },
    });
  };
}

Hope this helps isolating the issue. I'll try to fix my custom smoothStream implementation.

markusjura avatar Aug 06 '25 12:08 markusjura

I ran into the same issue. I was using experimental_transforms and manually filtering out the chunks, but I accidentally didn’t retain the providerMetadata from the reasoning-delta chunk. The providerMetadata actually comes in the last reasoning-delta chunk, but its text value is an empty string. By mistake, I didn’t enqueue it, which left it as undefined and caused the issue.

adithya-swipepages avatar Aug 22 '25 16:08 adithya-swipepages

I'm running into this when combining tool calling with extended thinking. Both work fine when used without the other, but when the llm tries to call a tool, I see this error immediately after the tool call completes.

Here's the ordering of things I'm seeing in the AI stream:

  1. start
  2. start-step
  3. reasoning-start
  4. reasoning-delta
  5. tool-input-start - Tool call begins (toolName: "search")
  6. tool-input-delta
  7. tool-input-available
  8. reasoning-end
  9. tool-output-available
  10. finish-step
  11. error - Anthropic API rejects the second request

So it seems like it errors here because there's no thinking step ending this message.

I asked opus 4 to analyze this with ai codebase context...not sure how accurate this is but maybe sheds some light? Sorry if it's just slop lol.


When the AI SDK prepares the second API call (after tool execution), it reconstructs the assistant message from these parts. The SDK internally converts:
- reasoning-delta parts → thinking block
- tool-input-* parts → tool_use block

However, because the tool-input parts appear BETWEEN the reasoning-delta and reasoning-end parts, the SDK likely processes them in that order, resulting in:

Assistant message content: [
  { type: 'tool_use', ... },    // From tool-input parts
  { type: 'thinking', ... }      // From reasoning-delta parts
]

Anthropic's API requires that when extended thinking is enabled, thinking blocks MUST precede tool_use blocks. The error confirms this: "When thinking is enabled, a final assistant message must start with a thinking block (preceding the lastmost set of tool_use and tool_result blocks)".

The SDK needs to reorder these blocks to satisfy Anthropic's requirement, regardless of the order they were streamed.

seankwalker avatar Aug 26 '25 04:08 seankwalker

One thing I just ran into was the unexpected change in model and version with 4.0. claude-4-sonnet-20250514 is not the correct model identifier for sonnet 4. As of the v4 models Anthropic swapped the ordering of the version number and model family. So while the claude-3-7-sonnet-20250219 structure was correct up to 3.7. It should now be claude-sonnet-4-20250514

nspady avatar Sep 05 '25 18:09 nspady

@nspady not sure how the change of Anthropic's model naming convention relates to this isssue?

gr2m avatar Sep 08 '25 00:09 gr2m

Ah the example code shared by @rbitar above uses the wrong model identifier.

gr2m avatar Sep 08 '25 00:09 gr2m

@nspady @gr2m Both formats of the model name seem to map to the same claude sonnet 4 version so looks like it's backwards compatible. An invalid model name would throw an exception anyways so anthropic does accept it.

Regardless, I did test out both formats and with reasoning enabled I still get the same error message.

rbitar avatar Sep 08 '25 05:09 rbitar

I can also confirm this is a consistent error I run into when combining thinking and tool use

sccorby avatar Sep 08 '25 17:09 sccorby

I can also confirm this is a consistent error I run into when combining thinking and tool use

I ran into the same issue. For me, I was not including signature in my reasoning blocks, which caused the reasoning blocks to get silently filtered out.

This is what ended up working for me:

[
  {
    role: 'system',
    content: 'System Prompt'
  },
  {
    role: 'user',
    content: 'User prompt'
  },
  {
    role: 'assistant',
    content: [
      {
        type: 'reasoning',
        text: 'reasoning text',
        providerOptions: { // This is the bit I was missing
          anthropic: {
            signature: 'signature'
          }
        }
      },
      {
        type: 'tool-call',
        toolCallId: 'toolu_01EimzZTFSYWdPP6tZHi2QAb',
        toolName: 'dsgQfJKJ',
        input: {
          query: 'query',
          maxResults: 50,
          justification: 'etc'
        },
        providerExecuted: undefined,
        providerMetadata: undefined
      }
    ]
  },
  {
    role: 'tool',
    content: [
      {
        type: 'tool-result',
        toolCallId: 'toolu_01EimzZTFSYWdPP6tZHi2QAb',
        toolName: 'dsgQfJKJ',
        output: {
          type: 'json',
          value: '.....'
        }
      }
    ]
  }
]

richrliu avatar Sep 11 '25 16:09 richrliu

I was using amazon-bedrock and ran into the same issue if I use anthropic as the providerOptions, using bedrock as the providerOptions fixed it for me

{
        type: 'reasoning',
        text: 'reasoning text',
        providerOptions: { 
          bedrock: {
            signature: 'signature'
          }
        }
      },

RuoniWang-at avatar Sep 15 '25 21:09 RuoniWang-at

      prepareStep: async ({ stepNumber, steps, messages }) => {
        // Compress conversation history for longer loops
        if (messages.length > 20) {
          return {
            messages: messages.slice(-10),
          };
        }
        return {};
      },

Altering the message history can lead to problems because of Anthropic's signature. Can you reproduce the problem without the prepareStep / prompt compression?

I can also confirm this is a consistent error I run into when combining thinking and tool use

@sccorby can you reproduce it with thinking and tool use, but without using experimental_transform? It's often time the source of problems, if a signature is lost or a thinking message is altered, Anthropic will throw an error.

gr2m avatar Sep 24 '25 16:09 gr2m

@gr2m The issue resolved for me by switching to the amazon/bedrock model anthropic/claude-sonnet-4-5, which is amazon bedrock routed using vercel AI gateway, instead of using anthropic from @ai-sdk/anthropic with anthropic("claude-sonnet-4-20250514"). Happy to consider this a viable solution for now and close this ticket. Truncating the messages to preserve the context length with onPrepareStep using these models doesn't seem to cause any issues.

rbitar avatar Oct 05 '25 17:10 rbitar

I realized the issue in our case -- we're not using experimental_transform or prepareStep or anything, but we are using the openai-compatible provider (against Litellm), not anthropic.

I looked into the code for each and whereas the anthropic provider has logic to handle reasoning parts and attach the signature metadata/thinking types (link), this isn't done in the openai-compatible provider (this makes sense, it's not like that sort of provider-specific logic would belong in the generic provider) (link)

@gr2m or anyone else, is this a use case that's reasonable for the openai-compatible provider? For us it makes sense because we want to be able to use thinking across various models in our Litellm instance. Happy to provide more context or work on a patch, but wanted to check if you had an opinion on the issue or how you'd want support for something like this to be implemented (e.g. could we add a custom transform function for handling provider-specific message parts)

seankwalker avatar Oct 06 '25 22:10 seankwalker

@rbitar I'd appreciate not closing this ticket. This is a trivial issue anyone using tools + anthropic with AI SDK would face. Any ETA/thoughts about a fix?

arielweinberger avatar Oct 16 '25 00:10 arielweinberger

Came across the same thing today, and can confirm that when using wrapLanguageModel to add a signature property to the reasoning block, it gets sent to the LLM. But then it will fail of course, because the reasoning signature is not correct.

arielweinberger avatar Oct 16 '25 00:10 arielweinberger

I'm getting the same issue. Even the response object out of streamText doesn't include the correct signature.

kenmueller avatar Oct 17 '25 23:10 kenmueller

When you encounter this issue, please check the following:

a) are you using prepareStep to change the messages? b) are you using a stream transformation (in particular custom ones)? c) are you using a provider other than @ai-sdk/anthropic?

All of these can be related to the issue. The AI SDK stores additional information in the provider metadata (here: Anthropic reasoning signatures) that are required by the Anthropic API if you want to send reasoning parts back. If this information gets changes or some of it is lost or other parts of the messages object change, there is a risk that the Anthropic API will reject it.

lgrammel avatar Oct 18 '25 07:10 lgrammel

@jmif's solution (https://github.com/vercel/ai/pull/7750) looks good to me. Can we consider it?

OoO256 avatar Nov 12 '25 06:11 OoO256