refactor: condense AGENTS.md following best practices
Summary
Refactored AGENTS.md following HumanLayer's best practices for writing effective AI agent guidance files:
- Reduced AGENTS.md from 442 lines to 96 lines (was 7x longer than recommended)
- Created
docs/agent-guides/directory for progressive disclosure of detailed topics - Replaced embedded code examples with file references to prevent outdated snippets
- Kept only universally applicable instructions in the main file
Progressive Disclosure Files Created
| File | Content |
|---|---|
docs/agent-guides/testing-patterns.md |
Vitest patterns, test structure, coverage requirements |
docs/agent-guides/ui-development.md |
Next.js/React patterns for agents-manage-ui |
docs/agent-guides/documentation.md |
MDX documentation guidelines for agents-docs |
docs/agent-guides/database-migrations.md |
Drizzle migration workflow and critical rules |
docs/agent-guides/git-worktrees.md |
Parallel development with git worktrees |
docs/agent-guides/debugging.md |
Jaeger tracing commands and debugging workflows |
Key Improvements
- Less is more: Frontier models can follow ~150-200 instructions max; previous file likely exceeded this
- Universal applicability: Detailed task-specific content moved to separate files
- Pointers over copies: Code examples replaced with references to actual files
- Progressive disclosure: Agents only see detailed guides when they navigate to them
Test plan
- [ ] Verify AI agents can still find essential commands in AGENTS.md
- [ ] Verify links to docs/agent-guides/ files work correctly
- [ ] Test that agents can follow progressive disclosure to find detailed information
⚠️ No Changeset found
Latest commit: 8d3ca96a503108b4f9117d7158dc094e9e2f7ba3
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
This PR includes no changesets
When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
The latest updates on your projects. Learn more about Vercel for GitHub.
| Project | Deployment | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| agents-docs | Preview | Comment | Dec 2, 2025 9:26pm | |
| agents-manage-api | Preview | Comment | Dec 2, 2025 9:26pm | |
| agents-manage-ui | Preview | Comment | Dec 2, 2025 9:26pm | |
| agents-run-api | Preview | Comment | Dec 2, 2025 9:26pm |
Claude finished @amikofalvy's task —— View job
Staff Engineer Review Complete
I've conducted a comprehensive architectural review of this PR covering both the AGENTS.md refactoring and the context tracking evaluation system. Here's my analysis:
🎯 Overall Assessment: Strong Refactoring with Well-Architected Context Tracking
This PR successfully addresses two distinct objectives: progressive disclosure for agent guidance and sophisticated context analysis capabilities. Both are well-executed with thoughtful architectural decisions.
📋 AGENTS.md Refactoring Analysis
✅ Excellent Progressive Disclosure Implementation
The refactoring from 442 to 96 lines following HumanLayer's best practices is architecturally sound:
- Cognitive Load Reduction: Maintains ~150-200 instruction limit for frontier models
- Universal Applicability: Core AGENTS.md now contains only universally relevant commands and patterns
- Smart Information Architecture: Task-specific content properly separated into
docs/agent-guides/
✅ Well-Structured Guide Files
The progressive disclosure files demonstrate good documentation patterns:
- Consistent Structure: All guides follow similar format (Stack → Commands → Guidelines)
- Practical Focus: Each guide provides immediately actionable information
- Proper Scope: Each guide covers a cohesive domain (testing, UI, docs, etc.)
⚠️ Minor Naming Inconsistency
Issue: File is named Agents.md (PascalCase) but referenced as AGENTS.md in documentation.
Recommendation:
# Standardize on AGENTS.md convention
git mv Agents.md AGENTS.md
🔍 Context Tracking System Analysis
This is a sophisticated addition that adds significant observability value. The implementation demonstrates strong system design principles:
✅ Excellent Token Estimation Strategy
agents-run-api/src/utils/token-estimator.ts:
- Model Agnostic: 4-chars-per-token approximation works across OpenAI/Anthropic/Gemini
- Performance Optimized: Simple string length calculation vs expensive tokenizer calls
- Appropriate Precision: Perfect for relative comparisons between context components
✅ Strong Type Safety & Extensibility
ContextBreakdown Interface Design:
export interface ContextBreakdown {
systemPromptTemplate: number;
coreInstructions: number;
agentPrompt: number;
// ... 11 total components with clear semantic naming
total: number;
}
- Comprehensive Coverage: Captures all major context components
- Clear Semantics: Field names clearly indicate their purpose
- Future-Proof: Easy to extend with new components
✅ Sophisticated UI Implementation
agents-manage-ui/src/components/traces/context-breakdown.tsx:
- Progressive Visual Design: Stacked bar + detailed breakdown provides multiple levels of insight
- Accessibility: Proper ARIA labeling, tooltips, and color coding
- Performance:
useMemofor expensive calculations - User Experience: Filters out segments <0.5% to avoid visual clutter
✅ Clean Architectural Integration
System Integration Points:
SystemPromptBuilder: ReturnsAssembleResultwith both prompt and breakdown- Phase1Config/Phase2Config: Implements token tracking throughout prompt assembly
- UI Components: Seamlessly integrates breakdown display in trace viewer
🔧 Architectural Considerations & Recommendations
1. Token Estimation Accuracy Trade-offs
Current Approach: Character-based approximation (~4 chars/token) Considerations:
- ✅ Fast, model-agnostic, dependency-free
- ⚠️ May be 15-25% off for some content types (code vs prose)
Recommendation: Current approach is excellent for the use case. Consider adding a configuration option for precision-sensitive scenarios:
interface TokenEstimatorConfig {
method: 'character-based' | 'model-specific';
model?: string; // For future model-specific tokenizers
}
2. Context Component Granularity
Current Breakdown: 11 components tracked individually Consideration: Balance between granularity and usability
Recommendation: Current granularity is appropriate. Consider grouping for high-level views:
interface ContextSummary {
instructions: number; // systemPromptTemplate + coreInstructions + agentPrompt
tools: number; // toolsSection + artifactComponents
history: number; // conversationHistory
// ... etc
}
3. Performance & Memory Considerations
Current Implementation: Creates breakdown objects for every prompt assembly
Considerations:
- Memory usage for high-volume scenarios
- CPU overhead of token estimation
Recommendations:
- ✅ Current implementation is efficient (simple calculations)
- Consider adding feature flag for production environments where breakdown isn't needed
- Potential optimization: Cache breakdown results for identical prompt components
🚨 Critical Items (None Found)
No critical architectural flaws or security issues identified. Both the documentation refactoring and context tracking implementation follow solid engineering principles.
🎯 Customer Impact Assessment
Positive Impacts:
- Developer Experience: Cleaner, more discoverable agent guidance
- Debugging Capabilities: Rich context analysis for troubleshooting token usage
- Performance Insights: Ability to optimize prompts based on component breakdown
- Maintainability: Progressive disclosure makes guidance easier to maintain
Risk Assessment: LOW
- No breaking changes to APIs or data contracts
- Additive features with graceful degradation
- Backward compatible changes only
📈 Future Extensibility
The architecture supports several natural extensions:
- Token Budget Warnings: Alert when approaching model context limits
- Component Optimization: Suggest which components to trim when over budget
- Historical Analysis: Track context evolution over time
- A/B Testing: Compare token efficiency across prompt variations
✅ Final Recommendation: APPROVE
This PR demonstrates excellent system architecture across both objectives:
- AGENTS.md Refactoring: Thoughtful progressive disclosure following industry best practices
- Context Tracking: Well-designed observability system with strong technical implementation
The only suggestion is the minor naming standardization for AGENTS.md. Otherwise, this is production-ready code that adds significant value to the platform.
Confidence Level: High - This is well-architected, thoroughly implemented, and follows TypeScript/React best practices throughout.