claude-agent-sdk-python Streaming support in `TextBlock`

While the SDK supports streaming for various message blocks (e.g., tool usage/result, system message, result message, text message etc.), TextBlock content is only delivered once it's fully available. This delays rendering for long messages, even if TTFT is low. Supporting partial streaming of TextBlock content would let applications render text incrementally, improving user experience a lot.

In current streaming guide, we receive entire block at once:

https://github.com/anthropics/claude-code-sdk-python/blob/3010aaf092186a24dae240d78f0ee7d2e0d5d62a/examples/streaming_mode_ipython.py#L26

Sep 12 '25 22:09 nishikantparmariam1

I encountered the same problem. The text and tool parameters could not get increments. Both the tool and the text needed to wait for a complete return.

Oct 19 '25 05:10 vassain

same problem here, long wait till a sudden appear, +1 to fix it

Oct 19 '25 19:10 modesty

Me too:

You can see it's split into chunks - the first chunk ends with "Would you lik" and the second chunk continues with "e to create...". The text is being streamed in 50-character chunks!

However, the chunks appear at the same timestamp because Claude SDK waits for the full response before giving us the TextBlock. The real solution is that Claude Agent SDK itself doesn't provide token-by-token streaming - it only provides message-level streaming.

More from claude code:

Investigation Complete: TextBlock Streaming Issue

I've completed the investigation into why TextBlock doesn't work properly when streaming chunks from the LLM. Here's what I found:

Root Cause

When include_partial_messages=True is enabled, the SDK receives streaming text chunks as StreamEvent objects containing raw Anthropic API events. However, these text deltas are NOT automatically accumulated into TextBlock objects.

The Problem

Current behavior:

Users receive StreamEvent objects with raw API events

Text chunks come in content_block_delta events

Users must manually extract and accumulate text from event.event.get("delta", {}).get("text", "")

Expected behavior:

Users expect TextBlock objects with accumulated text as it streams in

Similar to how complete messages work

Code Evidence

In message_parser.py:93-94, TextBlocks are only created from complete assistant messages: case "text": content_blocks.append(TextBlock(text=block["text"]))

But for streaming (message_parser.py:158-169), only raw StreamEvent wrappers are created with no text extraction.

The e2e tests in test_include_partial_messages.py:110-116 show users must manually accumulate: if event.get("type") == "content_block_delta": delta = event.get("delta", {}) if delta.get("type") == "thinking_delta": thinking_deltas.append(delta.get("thinking", ""))

Further tracing in https://github.com/anthropics/claude-agent-sdk-python/blob/main/examples/streaming_mode.py

async def example_basic_streaming():
    """Basic streaming with context manager."""
    print("=== Basic Streaming Example ===")

    async with ClaudeSDKClient() as client:
        print("User: What is 2+2?")
        await client.query("What is 2+2?")

        # Receive complete response using the helper method
        async for msg in client.receive_response():
            display_message(msg)

As result that is not supporting streaming:

How client.receive_response() Works

The Chain: client.receive_response() ↓ client.receive_messages() ↓ self._query.receive_messages() # Raw JSON from Claude CLI ↓ parse_message(data) # Parse JSON → Message objects ↓ Yields: AssistantMessage, UserMessage, ResultMessage

Key Implementation: async def receive_response(self): async for message in self.receive_messages(): yield message # ← Yields COMPLETE message objects if isinstance(message, ResultMessage): return # Stop after ResultMessage

What Gets Yielded:

The parse_message() function shows it parses complete blocks:

case "text": content_blocks.append(TextBlock(text=block["text"])) # ← Complete text case "thinking": content_blocks.append(ThinkingBlock(thinking=block["thinking"])) # ← Complete thinking

The Answer to Your Question:

client.receive_response() yields complete message objects, not incremental text tokens. Each iteration gives you:

AssistantMessage with complete TextBlock.text

UserMessage with complete tool results

ResultMessage with final stats

It does NOT yield:

NOT like this (token-by-token):

"H" → "He" → "Hel" → "Hello"

It yields:

Like this (message-by-message):

AssistantMessage(TextBlock(text="Hello! How can I help you today?")) # Complete UserMessage(ToolResultBlock(...)) # Complete ResultMessage(duration=5.2s) # Complete

Oct 20 '25 12:10 aetherwu

This is implemented in PR #274 and working.

Tested with Claude Code. Text streams incrementally when using accumulate_streaming_content=True. Before this, TextBlock content only appeared in complete chunks.

See test code and reproduction steps in https://github.com/anthropics/claude-agent-sdk-python/pull/274#issuecomment-3449741394

The PR solves the manual delta tracking problem mentioned here. Would be good to get this merged.

Oct 27 '25 06:10 KJ7LNW