sdk-python icon indicating copy to clipboard operation
sdk-python copied to clipboard

[FEATURE] Proactive Context Compression

Open roeetal opened this issue 5 months ago • 2 comments

Problem Statement

The current strand-agents SDK only triggers context compression reactively when a ContextWindowOverflowException is thrown. This reactive approach has several critical limitations:

  1. Output Token Starvation: LLM context limits combine input and output tokens. When input tokens approach the limit, the model may have insufficient capacity to generate meaningful responses, leading to truncated or poor-quality outputs.

  2. Performance Degradation: Waiting until context overflow occurs means the agent operates at maximum context capacity, potentially degrading response quality and increasing latency.

  3. Inefficient Resource Usage: Operating at context limits wastes computational resources and may trigger unnecessary retries.

The current SlidingWindowConversationManager and SummarizingConversationManager only act when reduce_context() is called after an exception, missing opportunities for proactive optimization.

Proposed Solution

Implement Proactive Context Compression that triggers automatically when context usage reaches a configurable threshold (default: 70% of context window limit), preventing context overflow before it occurs.

Core Components

1. Proactive Compression Interface

Extend the ConversationManager abstract base class:

@abstractmethod
def should_compress(self, agent: "Agent", **kwargs: Any) -> bool:
    """Determine if proactive compression should be triggered.
    
    Args:
        agent: The agent whose context will be evaluated.
        **kwargs: Additional parameters for compression decision.
        
    Returns:
        True if compression should be triggered, False otherwise.
    """
    pass

@abstractmethod  
def get_compression_threshold(self) -> float:
    """Get the current compression threshold as a percentage (0.0-1.0)."""
    pass

2. Enhanced Apply Management

Modify apply_management() to support proactive compression:

def apply_management(self, agent: "Agent", **kwargs: Any) -> None:
    """Apply management strategy including proactive compression."""
    if self.should_compress(agent, **kwargs):
        self.reduce_context(agent, **kwargs)

3. Configuration Options

Add configuration parameters to conversation managers:

class ProactiveCompressionConfig:
    compression_threshold: float = 0.7  # 70% of context window
    enable_proactive_compression: bool = True
    min_messages_before_compression: int = 2
    compression_cooldown_messages: int = 1  # Prevent rapid re-compression

Implementation Strategy

Phase 1: Infrastructure

  1. Extend ConversationManager base class with proactive compression methods
  2. Add compression threshold configuration to existing managers
  3. Implement token estimation utilities

Phase 2: Manager Updates

  1. Update SummarizingConversationManager with proactive compression
  2. Update SlidingWindowConversationManager with threshold-based trimming
  3. Add compression cooldown logic to prevent thrashing

Phase 3: Integration

  1. Integrate with Agent._run_loop() for automatic triggering
  2. Add telemetry and metrics for compression events
  3. Implement configuration validation and error handling

Use Case

1. Long-Running Conversations

agent = Agent(
    model=model,
    conversation_manager=SummarizingConversationManager(
        enable_proactive_compression=True,
        compression_threshold=0.7,  # Compress at 70% capacity
        preserve_recent_messages=10
    )
)

# Agent automatically compresses context before hitting limits
for i in range(100):
    result = agent(f"Tell me about topic {i}")
    # Compression happens transparently when needed

2. Tool-Heavy Workflows

agent = Agent(
    model=model,
    tools=[data_analysis_tool, api_client, file_processor],
    conversation_manager=SummarizingConversationManager(
        compression_threshold=0.6,  # More aggressive for tool-heavy workflows
        preserve_recent_messages=5
    )
)

# Large tool results are compressed before context overflow
result = agent("Process this large dataset and generate insights")

3. Multi-Agent Scenarios

# Coordinator agent with proactive compression
coordinator = Agent(
    conversation_manager=SlidingWindowConversationManager(
        window_size=50,
        compression_threshold=0.8,  # Conservative threshold
        enable_proactive_compression=True
    )
)

Alternatives Solutions

1. Reactive Compression with Prediction

  • Keep current reactive model but add predictive logic
  • Pros: Minimal changes to existing architecture
  • Cons: Still risks output token starvation, doesn't solve core problem

2. Dynamic Context Window Adjustment

  • Automatically adjust context window size based on usage patterns
  • Pros: More flexible resource usage
  • Cons: Complex implementation, model-dependent behavior

Additional Context

No response

roeetal avatar Jul 28 '25 16:07 roeetal

It seems like a very interesting feature. Is its implementation planned?

m-peirone-reply avatar Nov 14 '25 14:11 m-peirone-reply

I ran into this same problem and realized the SDK is missing two foundational pieces needed to implement threshold-based compression:

  1. No way to estimate current token usage - #1294 proposes adding estimate_tokens() to the Model interface
  2. No way to know the context limit - #1295 proposes adding a context_limit property to Model

Without these, you cannot calculate what "70% of context window" actually means. Once both are available, proactive compression becomes straightforward.

westonbrown avatar Dec 05 '25 18:12 westonbrown