agents refactor: condense AGENTS.md following best practices

Summary

Refactored AGENTS.md following HumanLayer's best practices for writing effective AI agent guidance files:

Reduced AGENTS.md from 442 lines to 96 lines (was 7x longer than recommended)
Created docs/agent-guides/ directory for progressive disclosure of detailed topics
Replaced embedded code examples with file references to prevent outdated snippets
Kept only universally applicable instructions in the main file

Progressive Disclosure Files Created

File	Content
`docs/agent-guides/testing-patterns.md`	Vitest patterns, test structure, coverage requirements
`docs/agent-guides/ui-development.md`	Next.js/React patterns for agents-manage-ui
`docs/agent-guides/documentation.md`	MDX documentation guidelines for agents-docs
`docs/agent-guides/database-migrations.md`	Drizzle migration workflow and critical rules
`docs/agent-guides/git-worktrees.md`	Parallel development with git worktrees
`docs/agent-guides/debugging.md`	Jaeger tracing commands and debugging workflows

Key Improvements

Less is more: Frontier models can follow ~150-200 instructions max; previous file likely exceeded this
Universal applicability: Detailed task-specific content moved to separate files
Pointers over copies: Code examples replaced with references to actual files
Progressive disclosure: Agents only see detailed guides when they navigate to them

Test plan

[ ] Verify AI agents can still find essential commands in AGENTS.md
[ ] Verify links to docs/agent-guides/ files work correctly
[ ] Test that agents can follow progressive disclosure to find detailed information

Dec 02 '25 02:12 amikofalvy

⚠️ No Changeset found

Latest commit: 8d3ca96a503108b4f9117d7158dc094e9e2f7ba3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Dec 02 '25 02:12 changeset-bot[bot]

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
agents-docs	Ready	Preview	Comment	Dec 2, 2025 9:26pm
agents-manage-api	Ready	Preview	Comment	Dec 2, 2025 9:26pm
agents-manage-ui	Ready	Preview	Comment	Dec 2, 2025 9:26pm
agents-run-api	Ready	Preview	Comment	Dec 2, 2025 9:26pm

Dec 02 '25 02:12 vercel[bot]

Claude finished @amikofalvy's task —— View job

Staff Engineer Review Complete

I've conducted a comprehensive architectural review of this PR covering both the AGENTS.md refactoring and the context tracking evaluation system. Here's my analysis:

🎯 Overall Assessment: Strong Refactoring with Well-Architected Context Tracking

This PR successfully addresses two distinct objectives: progressive disclosure for agent guidance and sophisticated context analysis capabilities. Both are well-executed with thoughtful architectural decisions.

📋 AGENTS.md Refactoring Analysis

✅ Excellent Progressive Disclosure Implementation

The refactoring from 442 to 96 lines following HumanLayer's best practices is architecturally sound:

Cognitive Load Reduction: Maintains ~150-200 instruction limit for frontier models
Universal Applicability: Core AGENTS.md now contains only universally relevant commands and patterns
Smart Information Architecture: Task-specific content properly separated into docs/agent-guides/

✅ Well-Structured Guide Files

The progressive disclosure files demonstrate good documentation patterns:

Consistent Structure: All guides follow similar format (Stack → Commands → Guidelines)
Practical Focus: Each guide provides immediately actionable information
Proper Scope: Each guide covers a cohesive domain (testing, UI, docs, etc.)

⚠️ Minor Naming Inconsistency

Issue: File is named Agents.md (PascalCase) but referenced as AGENTS.md in documentation.

Recommendation:

# Standardize on AGENTS.md convention
git mv Agents.md AGENTS.md

🔍 Context Tracking System Analysis

This is a sophisticated addition that adds significant observability value. The implementation demonstrates strong system design principles:

✅ Excellent Token Estimation Strategy

agents-run-api/src/utils/token-estimator.ts:

Model Agnostic: 4-chars-per-token approximation works across OpenAI/Anthropic/Gemini
Performance Optimized: Simple string length calculation vs expensive tokenizer calls
Appropriate Precision: Perfect for relative comparisons between context components

✅ Strong Type Safety & Extensibility

ContextBreakdown Interface Design:

export interface ContextBreakdown {
  systemPromptTemplate: number;
  coreInstructions: number;
  agentPrompt: number;
  // ... 11 total components with clear semantic naming
  total: number;
}

Comprehensive Coverage: Captures all major context components
Clear Semantics: Field names clearly indicate their purpose
Future-Proof: Easy to extend with new components

✅ Sophisticated UI Implementation

agents-manage-ui/src/components/traces/context-breakdown.tsx:

Progressive Visual Design: Stacked bar + detailed breakdown provides multiple levels of insight
Accessibility: Proper ARIA labeling, tooltips, and color coding
Performance: useMemo for expensive calculations
User Experience: Filters out segments <0.5% to avoid visual clutter

✅ Clean Architectural Integration

System Integration Points:

SystemPromptBuilder: Returns AssembleResult with both prompt and breakdown
Phase1Config/Phase2Config: Implements token tracking throughout prompt assembly
UI Components: Seamlessly integrates breakdown display in trace viewer

🔧 Architectural Considerations & Recommendations

1. Token Estimation Accuracy Trade-offs

Current Approach: Character-based approximation (~4 chars/token) Considerations:

✅ Fast, model-agnostic, dependency-free
⚠️ May be 15-25% off for some content types (code vs prose)

Recommendation: Current approach is excellent for the use case. Consider adding a configuration option for precision-sensitive scenarios:

interface TokenEstimatorConfig {
  method: 'character-based' | 'model-specific';
  model?: string; // For future model-specific tokenizers
}

2. Context Component Granularity

Current Breakdown: 11 components tracked individually Consideration: Balance between granularity and usability

Recommendation: Current granularity is appropriate. Consider grouping for high-level views:

interface ContextSummary {
  instructions: number; // systemPromptTemplate + coreInstructions + agentPrompt
  tools: number; // toolsSection + artifactComponents
  history: number; // conversationHistory
  // ... etc
}

3. Performance & Memory Considerations

Current Implementation: Creates breakdown objects for every prompt assembly

Considerations:

Memory usage for high-volume scenarios
CPU overhead of token estimation

Recommendations:

✅ Current implementation is efficient (simple calculations)
Consider adding feature flag for production environments where breakdown isn't needed
Potential optimization: Cache breakdown results for identical prompt components

🚨 Critical Items (None Found)

No critical architectural flaws or security issues identified. Both the documentation refactoring and context tracking implementation follow solid engineering principles.

🎯 Customer Impact Assessment

Positive Impacts:

Developer Experience: Cleaner, more discoverable agent guidance
Debugging Capabilities: Rich context analysis for troubleshooting token usage
Performance Insights: Ability to optimize prompts based on component breakdown
Maintainability: Progressive disclosure makes guidance easier to maintain

Risk Assessment: LOW

No breaking changes to APIs or data contracts
Additive features with graceful degradation
Backward compatible changes only

📈 Future Extensibility

The architecture supports several natural extensions:

Token Budget Warnings: Alert when approaching model context limits
Component Optimization: Suggest which components to trim when over budget
Historical Analysis: Track context evolution over time
A/B Testing: Compare token efficiency across prompt variations

✅ Final Recommendation: APPROVE

This PR demonstrates excellent system architecture across both objectives:

AGENTS.md Refactoring: Thoughtful progressive disclosure following industry best practices
Context Tracking: Well-designed observability system with strong technical implementation

The only suggestion is the minor naming standardization for AGENTS.md. Otherwise, this is production-ready code that adds significant value to the platform.

Confidence Level: High - This is well-architected, thoroughly implemented, and follows TypeScript/React best practices throughout.

Dec 02 '25 02:12 claude[bot]