claude-coder
claude-coder copied to clipboard

Published 20 hours ago •

Reame
Issues

streaming slowdown when context rises

Open PierrunoYT opened this issue 1 year ago • 0 comments

The streaming slowdown when context rises can be attributed to several factors in the current implementation:

Context Management Mechanism:

Uses a static window approach instead of dynamic sliding to preserve prompt caching
When context gets too large, it triggers truncateHalfConversation which can cause delays
The system waits until context is too large before compressing, rather than preemptively managing it

Streaming Implementation Bottlenecks:

Current debouncer has a fixed 25ms delay for processing chunks
All chunks are processed in sequence, which can cause backpressure when context is large
The system retries up to 3 times when context is too long, each retry adding latency

Memory Management:

Large contexts are kept in memory until they hit the maximum token limit
The smart truncation system keeps 8 recent messages intact, which could be excessive for very large contexts
Context compression only happens reactively when hitting limits rather than proactively

The slowdown is primarily caused by:

The reactive nature of context compression (only happens when hitting limits)
Sequential processing of chunks with fixed delays
Keeping too many recent messages intact during truncation
Multiple retry attempts when context is too long

To improve performance, consider:

Implementing proactive context compression before hitting limits
Adjusting the RECENT_MESSAGES_TO_PRESERVE count based on context size
Using a dynamic debouncer delay based on context size
Implementing parallel chunk processing for large contexts
Adding progressive context compression instead of waiting for full truncation

These changes would help maintain consistent streaming performance even as context size increases.

Nov 17 '24 17:11 PierrunoYT