Bug Report: Server tool input reconstruction missing in streaming

Summary

The SDK correctly reconstructs input fields for tool_use blocks during streaming via input_json_delta events, but fails to do the same for server_tool_use blocks (e.g., code execution tool). This creates inconsistent behavior between client and server tools, breaking legitimate use cases like code extraction from streaming responses. If confirmed, I can submit a PR.

Use Case & Context

I was building a math solver application that uses Claude's code execution tool with streaming for better user experience. The application needed to:

Stream the response for real-time feedback
Extract the executed code blocks for logging/analysis
Provide a smooth user experience with both streaming and code extraction

Problem Description

When using streaming with server tools (like code_execution_20250522), the final message contains server_tool_use blocks with empty input dictionaries, making it impossible to extract the actual code that was executed.

Expected Behavior

# After streaming completion
for item in final_message.content:
    if item.type == "server_tool_use" and item.name == "code_execution":
        print(item.input)  # Should contain: {"code": "print(2 + 2)"}

Actual Behavior

# After streaming completion  
for item in final_message.content:
    if item.type == "server_tool_use" and item.name == "code_execution":
        print(item.input)  # Actually contains: {}

Investigation & Root Cause

What We Tried

Non-streaming vs Streaming comparison: Non-streaming works perfectly, streaming fails
Different streaming approaches: Both simple text_stream and complex event handling fail
current_message_snapshot inspection: Same empty inputs (it's the same object as get_final_message())
Manual delta reconstruction: Successfully implemented by tracking input_json_delta events
Client vs Server tool comparison: Client tools work, server tools don't

Root Cause in SDK Source Code

Found in src/anthropic/lib/streaming/_messages.py, line 431:

elif event.delta.type == "input_json_delta":
    if content.type == "tool_use":  # ← Only handles CLIENT tools
        from jiter import from_json
        # JSON reconstruction logic...
        json_buf = cast(bytes, getattr(content, JSON_BUF_PROPERTY, b""))
        json_buf += bytes(event.delta.partial_json, "utf-8")
        if json_buf:
            content.input = from_json(json_buf, partial_mode=True)
        setattr(content, JSON_BUF_PROPERTY, json_buf)
    # Missing: elif content.type == "server_tool_use": block

The SDK only reconstructs inputs for tool_use (client tools), completely ignoring server_tool_use (server tools).

Reproduction Steps

Minimal Test Case

import os
from anthropic import Anthropic

client = Anthropic(
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    default_headers={"anthropic-beta": "code-execution-2025-05-22"}
)

# Test with streaming
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Calculate 2+2 using Python"}],
    tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    
    final_message = stream.get_final_message()
    
    for item in final_message.content:
        if item.type == "server_tool_use":
            print(f"\nServer tool input: {item.input}")  # Shows: {}

# Compare with non-streaming (works correctly)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Calculate 2+2 using Python"}],
    tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
)

for item in response.content:
    if item.type == "server_tool_use":
        print(f"Non-streaming input: {item.input}")  # Shows: {"code": "print(2 + 2)"}

Client vs Server Tool Comparison

# CLIENT TOOL (works with streaming)
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "name": "get_weather",
        "description": "Get weather",
        "input_schema": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }],
    tool_choice={"type": "tool", "name": "get_weather"}
) as stream:
    # ... consume stream ...
    for item in final_message.content:
        if item.type == "tool_use":
            print(item.input)  # ✅ Shows: {"location": "Paris"}

# SERVER TOOL (broken with streaming)
with client.messages.stream(
    # ... same code as above but with code_execution tool ...
    for item in final_message.content:
        if item.type == "server_tool_use":
            print(item.input)  # ❌ Shows: {}

Evidence from API Documentation

The official streaming documentation clearly shows that input_json_delta events are sent for server tools:

// Code execution streaming example from docs
event: content_block_delta  
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "input_json_delta", "partial_json": "{\"code\":\"import pandas as pd\\ndf = pd.read_csv('data.csv')\\nprint(df.head())\""}}

The API sends the data, but the SDK ignores it for server tools.

Impact

This affects any application that needs to:

Extract executed code for logging/analysis
Build debugging tools for AI code execution
Implement code history/replay features
Provide transparency about what code was run
Create educational tools showing step-by-step code execution

Recommended Fix

Extend the existing reconstruction logic to handle server tools:

elif event.delta.type == "input_json_delta":
    if content.type == "tool_use":
        # existing client tool logic
        from jiter import from_json
        json_buf = cast(bytes, getattr(content, JSON_BUF_PROPERTY, b""))
        json_buf += bytes(event.delta.partial_json, "utf-8")
        if json_buf:
            content.input = from_json(json_buf, partial_mode=True)
        setattr(content, JSON_BUF_PROPERTY, json_buf)
    elif content.type == "server_tool_use":  # ← Add this block
        # Same reconstruction logic for server tools
        from jiter import from_json
        json_buf = cast(bytes, getattr(content, JSON_BUF_PROPERTY, b""))
        json_buf += bytes(event.delta.partial_json, "utf-8")
        if json_buf:
            content.input = from_json(json_buf, partial_mode=True)
        setattr(content, JSON_BUF_PROPERTY, json_buf)

Workaround (Manual Implementation)

We successfully implemented manual delta tracking as a workaround:

def extract_code_blocks_streaming_fixed(response):
    """Working code extraction with manual delta reconstruction."""
    code_blocks = []
    accumulated_deltas = {}  # Track by content block index
    
    # During streaming, accumulate input_json_delta events
    # Then manually parse and reconstruct after completion
    # (Full implementation available if needed)
    
    return code_blocks

But this should not be necessary - the SDK should handle this automatically like it does for client tools.

Environment

anthropic-sdk-python: Latest version
Python: 3.9+
Model: claude-sonnet-4-20250514
Tool: code_execution_20250522

Conclusion

This appears to be an oversight in the SDK implementation rather than intentional design. The API sends input_json_delta events for server tools, the documentation shows examples of it, and the SDK already has the reconstruction logic - it just doesn't apply it consistently to both tool types.

The fix would be minimal, low-risk, and would restore API consistency while enabling legitimate use cases.

May 24 '25 17:05 BexTuychiev

Thanks for the bug report, we're looking into it!

May 28 '25 03:05 chaselambda

Hey @BexTuychiev thanks so much for the very thorough report! I merged in a fix for this today and it should go out in the next release: https://github.com/anthropics/anthropic-sdk-python/pull/960. I also added some testing to better catch and prevent this sort of thing in the future.

May 29 '25 18:05 dtmeadows