DeepSeek V3.2 : reasoning_content not cleared from message history on new turns, causing excess token usage and violating API spec
Summary
OpenCode does not clear reasoning_content from previous conversation turns when sending messages to DeepSeek models, which violates the DeepSeek Thinking Mode API specification and causes unnecessary token usage, increased costs, and slower responses.
Problem
According to DeepSeek's official documentation for V3.2 models with thinking mode:
"In each turn of the conversation, the model outputs the CoT (reasoning_content) and the final answer (content). In the next turn of the conversation, the CoT from previous turns is not concatenated into the context"
The spec explicitly shows a clear_reasoning_content() function that should be called before Turn 2:
def clear_reasoning_content(messages):
for message in messages:
if hasattr(message, 'reasoning_content'):
message.reasoning_content = None
Current Behavior
OpenCode's logic in packages/opencode/src/provider/transform.ts currently:
- ✅ Correctly adds
reasoning_contentfor tool call continuations within the same turn - ✅ Strips reasoning from individual messages without tool calls
- ❌ Does not clear
reasoning_contentfrom ALL assistant messages in history when a new user turn begins
Result: reasoning chains from previous turns accumulate in the context window, being repeatedly sent in each new turn.
Impact
-
Wasted tokens: You pay for repeated
reasoning_contentfrom prior turns - Higher costs: DeepSeek charges per token
- Slower responses: Larger context = more processing time
- Context window pressure: Fills up context limits faster with redundant data
- Spec violation: Not following DeepSeek's documented API contract
Concrete Example
Turn 1:
- User asks question
- Model reasons: 500 tokens
- Makes tool call
- Model reasons more: 300 tokens
- Gives answer
- Total reasoning: 800 tokens
Turn 2 (new user question):
- OpenCode sends Turn 1's 800 reasoning tokens again
- Model generates new reasoning: 600 tokens
- API receives: 800 (old) + 600 (new) = 1,400 tokens
- Should only send: 600 tokens
Expected Behavior
When a new user message starts a fresh turn, OpenCode should:
- Detect the turn boundary (new user message)
- Strip
reasoning_contentfrom ALL previous assistant messages before sending to the API - Send only current-turn reasoning to DeepSeek
Reproduction
- Start conversation with DeepSeek V3 model (
deepseek-chatwith thinking enabled, ordeepseek-reasoner) - Make a multi-step conversation that outputs reasoning with tool calls
- On second and subsequent user messages, inspect the payload—previous turns' reasoning is being sent again
Suggested Fix
Add turn boundary detection in packages/opencode/src/provider/transform.ts:
if (model.providerID === "deepseek" || model.api.id.toLowerCase().includes("deepseek")) {
// Find last user message (start of current turn)
let lastUserIndex = -1
for (let i = msgs.length - 1; i >= 0; i--) {
if (msgs[i].role === "user") {
lastUserIndex = i
break
}
}
return msgs.map((msg, index) => {
if (msg.role === "assistant" && Array.isArray(msg.content)) {
const reasoningParts = msg.content.filter((part: any) => part.type === "reasoning")
const hasToolCalls = msg.content.some((part: any) => part.type === "tool-call")
const reasoningText = reasoningParts.map((part: any) => part.text).join("")
const filteredContent = msg.content.filter((part: any) => part.type !== "reasoning")
// Only include reasoning_content for messages in the current turn (after lastUserIndex)
if (hasToolCalls && reasoningText && index > lastUserIndex) {
return {
...msg,
content: filteredContent,
providerOptions: {
...msg.providerOptions,
openaiCompatible: {
.. .(msg.providerOptions as any)?.openaiCompatible,
reasoning_content: reasoningText,
},
},
}
}
// Strip reasoning from all other messages
return {
...msg,
content: filteredContent,
}
}
return msg
})
}
References
- DeepSeek Thinking Mode API
- DeepSeek Multi-turn Conversation Section
- DeepSeek Tool Calls with Thinking Mode
- Current transform code location:
packages/opencode/src/provider/transform.ts
OpenCode Version: 1.0.x (current)
Affected Models: All DeepSeek V3.2 models with interleaved thinking support (deepseek-chat when thinking enabled, deepseek-reasoner)
This issue might be a duplicate of existing issues. Please check:
- #5027: Deepseek missing "reasoning_content" field; deepseek_reasoner model erroring out - Similar issue with DeepSeek reasoning_content handling
- #3035: Do not send reasoning traces and tool calls made during reasoning when switching from reasoning model to non reasoning model - Related to reasoning content accumulation in message history
- #4895: Excluding reasoning context to reduce token usage - Related feature request to optimize token consumption by excluding reasoning content
Feel free to ignore if none of these address your specific case.
maybe we should be a bit more flexible than the code snippet I provided however overall the issue is mainly correct
@cperion in their docs it says:
When the next user question begins (Turn 2.1), the previous reasoning_content should be removed, while keeping other elements to send to the API. If reasoning_content is retained and sent to the API, the API will ignore it.
It says the only cost to not clearing it is network bandwidth, api will remove automatically
We recommended to clear the reasoning_content in history messages so as to save network bandwidth clear_reasoning_content(messages)
Now should we follow their guidance, yes but increasing token usage and violation of the spec seems incorrect? Spec allows it but discourages it for network reasons and it says nothing about token usage here
from my understanding that means current behavior is correct
I'll check token count tho because in my understanding token percentage should decrease a bit after each user message because the old thoughts should not be counted and it looks like it is not why I observed today while working with it
Okay that second comment u made may be correct, there could be a bug in our visual (perceived token counting logic) that'd be interesting to see