opencode icon indicating copy to clipboard operation
opencode copied to clipboard

DeepSeek V3.2 : reasoning_content not cleared from message history on new turns, causing excess token usage and violating API spec

Open cperion opened this issue 1 month ago • 6 comments

Summary

OpenCode does not clear reasoning_content from previous conversation turns when sending messages to DeepSeek models, which violates the DeepSeek Thinking Mode API specification and causes unnecessary token usage, increased costs, and slower responses.

Problem

According to DeepSeek's official documentation for V3.2 models with thinking mode:

"In each turn of the conversation, the model outputs the CoT (reasoning_content) and the final answer (content). In the next turn of the conversation, the CoT from previous turns is not concatenated into the context"

The spec explicitly shows a clear_reasoning_content() function that should be called before Turn 2:

def clear_reasoning_content(messages):
    for message in messages:
        if hasattr(message, 'reasoning_content'):
            message.reasoning_content = None

Current Behavior

OpenCode's logic in packages/opencode/src/provider/transform.ts currently:

  • ✅ Correctly adds reasoning_content for tool call continuations within the same turn
  • ✅ Strips reasoning from individual messages without tool calls
  • ❌ Does not clear reasoning_content from ALL assistant messages in history when a new user turn begins

Result: reasoning chains from previous turns accumulate in the context window, being repeatedly sent in each new turn.

Impact

  • Wasted tokens: You pay for repeated reasoning_content from prior turns
  • Higher costs: DeepSeek charges per token
  • Slower responses: Larger context = more processing time
  • Context window pressure: Fills up context limits faster with redundant data
  • Spec violation: Not following DeepSeek's documented API contract

Concrete Example

Turn 1:

  • User asks question
  • Model reasons: 500 tokens
  • Makes tool call
  • Model reasons more: 300 tokens
  • Gives answer
  • Total reasoning: 800 tokens

Turn 2 (new user question):

  • OpenCode sends Turn 1's 800 reasoning tokens again
  • Model generates new reasoning: 600 tokens
  • API receives: 800 (old) + 600 (new) = 1,400 tokens
  • Should only send: 600 tokens

Expected Behavior

When a new user message starts a fresh turn, OpenCode should:

  1. Detect the turn boundary (new user message)
  2. Strip reasoning_content from ALL previous assistant messages before sending to the API
  3. Send only current-turn reasoning to DeepSeek

Reproduction

  1. Start conversation with DeepSeek V3 model (deepseek-chat with thinking enabled, or deepseek-reasoner)
  2. Make a multi-step conversation that outputs reasoning with tool calls
  3. On second and subsequent user messages, inspect the payload—previous turns' reasoning is being sent again

Suggested Fix

Add turn boundary detection in packages/opencode/src/provider/transform.ts:

if (model.providerID === "deepseek" || model.api.id.toLowerCase().includes("deepseek")) {
  // Find last user message (start of current turn)
  let lastUserIndex = -1
  for (let i = msgs.length - 1; i >= 0; i--) {
    if (msgs[i].role === "user") {
      lastUserIndex = i
      break
    }
  }

  return msgs.map((msg, index) => {
    if (msg.role === "assistant" && Array.isArray(msg.content)) {
      const reasoningParts = msg.content.filter((part: any) => part.type === "reasoning")
      const hasToolCalls = msg.content.some((part: any) => part.type === "tool-call")
      const reasoningText = reasoningParts.map((part: any) => part.text).join("")
      const filteredContent = msg.content.filter((part: any) => part.type !== "reasoning")

      // Only include reasoning_content for messages in the current turn (after lastUserIndex)
      if (hasToolCalls && reasoningText && index > lastUserIndex) {
        return {
          ...msg,
          content: filteredContent,
          providerOptions:  {
            ...msg.providerOptions,
            openaiCompatible: {
              .. .(msg.providerOptions as any)?.openaiCompatible,
              reasoning_content: reasoningText,
            },
          },
        }
      }
      // Strip reasoning from all other messages
      return {
        ...msg,
        content: filteredContent,
      }
    }
    return msg
  })
}

References

OpenCode Version: 1.0.x (current) Affected Models: All DeepSeek V3.2 models with interleaved thinking support (deepseek-chat when thinking enabled, deepseek-reasoner)

cperion avatar Dec 15 '25 20:12 cperion

This issue might be a duplicate of existing issues. Please check:

  • #5027: Deepseek missing "reasoning_content" field; deepseek_reasoner model erroring out - Similar issue with DeepSeek reasoning_content handling
  • #3035: Do not send reasoning traces and tool calls made during reasoning when switching from reasoning model to non reasoning model - Related to reasoning content accumulation in message history
  • #4895: Excluding reasoning context to reduce token usage - Related feature request to optimize token consumption by excluding reasoning content

Feel free to ignore if none of these address your specific case.

github-actions[bot] avatar Dec 15 '25 20:12 github-actions[bot]

maybe we should be a bit more flexible than the code snippet I provided however overall the issue is mainly correct

cperion avatar Dec 15 '25 20:12 cperion

@cperion in their docs it says:

When the next user question begins (Turn 2.1), the previous reasoning_content should be removed, while keeping other elements to send to the API. If reasoning_content is retained and sent to the API, the API will ignore it.

It says the only cost to not clearing it is network bandwidth, api will remove automatically

We recommended to clear the reasoning_content in history messages so as to save network bandwidth clear_reasoning_content(messages)

Now should we follow their guidance, yes but increasing token usage and violation of the spec seems incorrect? Spec allows it but discourages it for network reasons and it says nothing about token usage here

rekram1-node avatar Dec 15 '25 20:12 rekram1-node

from my understanding that means current behavior is correct

cperion avatar Dec 15 '25 20:12 cperion

I'll check token count tho because in my understanding token percentage should decrease a bit after each user message because the old thoughts should not be counted and it looks like it is not why I observed today while working with it

cperion avatar Dec 15 '25 20:12 cperion

Okay that second comment u made may be correct, there could be a bug in our visual (perceived token counting logic) that'd be interesting to see

rekram1-node avatar Dec 15 '25 20:12 rekram1-node