claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[FEATURE] Skills Context Overflow Problem - Allow us to hide all skills except for router skills

Open tachyon-beep opened this issue 2 months ago • 3 comments

Preflight Checklist

  • [x] I have searched existing requests and this feature hasn't been requested yet
  • [x] This is a single feature request (not multiple features)

Problem Statement

Skills Context Overflow Problem

Current Situation:

  • I've authored 100+ specialized skills organized behind ~10 router skills
  • Router skills are designed to dynamically load the appropriate specialized skill on-demand
  • Each skill's name + description consumes ~30-50 tokens in Claude's global context

The Problem: Without a way to hide or prioritize skills, all 100+ skills load their metadata into context automatically, causing:

  1. Router crowding: The 10 router skills that Claude should prioritize are buried among 100+ specialized skills, making skill discovery unreliable
  2. Wasted context budget: ~3,000-5,000 tokens consumed by metadata for skills that should only load on-demand when routed to
  3. Scalability ceiling: Can only load 1-2 skill packs before hitting context limits, defeating the purpose of the router pattern

Proposed Solution

What's Needed: Two complementary mechanisms:

  1. Hidden Skills (context efficiency)

Mark specialized skills as "hidden" so they don't consume context until explicitly invoked:


name: python-engineering:async-debugging description: Deep async/await debugging patterns hidden: true # Don't load metadata into global context

  1. Router Priority (skill discovery)

Mark router skills with higher priority so Claude checks them first:


name: python-engineering description: Routes to specialized Python skills based on problem type router: true # Prioritize during skill selection

Combined Effect:

  • Claude sees only 10 router skills in context (~300-500 tokens)
  • Routers are checked first during skill discovery
  • Specialized skills load on-demand when router invokes them
  • Pattern scales to hundreds of skills without context bloat

Example Hierarchy: ✓ python-engineering (router: true, visible, ~40 tokens) ├── python-engineering:async-debugging (hidden: true, 0 tokens until invoked) ├── python-engineering:testing-patterns (hidden: true, 0 tokens until invoked) └── python-engineering:performance-profiling (hidden: true, 0 tokens until invoked)

✓ pytorch-engineering (router: true, visible, ~40 tokens) ├── pytorch-engineering:tensor-debugging (hidden: true, 0 tokens until invoked) └── pytorch-engineering:distributed-training (hidden: true, 0 tokens until invoked)

Benefits:

  • Supports large-scale skill libraries (100+ skills)
  • Efficient context usage (only routers in global context)
  • Reliable skill discovery (routers prioritized)
  • Enables proper separation of concerns (routing vs. specialized knowledge)

Alternative Solutions

Right now the manual work around is adding and disabling plugins as required however this requires reloading the claudecode and is severely disruptive.

Priority

High - Significant impact on productivity

Feature Category

Configuration and settings

Use Case Example

Use Case Example

Scenario: A developer working on a PyTorch training loop encounters a CUDA out-of-memory error.

Without hidden/router fields (Current Behavior):

Developer: "My PyTorch training is crashing with CUDA OOM errors"

Claude's context at session start: Available skills (112 total, ~4,000 tokens):

  • python-engineering
  • python-engineering:async-debugging
  • python-engineering:testing-patterns
  • python-engineering:performance-profiling
  • pytorch-engineering
  • pytorch-engineering:tensor-debugging
  • pytorch-engineering:distributed-training
  • pytorch-engineering:memory-profiling
  • pytorch-engineering:mixed-precision ... (104 more skills)

Problem: Claude has to scan through 112 skill descriptions to find the right match. The pytorch-engineering router is buried. Claude might randomly pick pytorch-engineering:memory-profiling directly, bypassing the router's triage logic, or miss it entirely and not use any skill.


With hidden/router fields (Proposed Behavior):

Developer: "My PyTorch training is crashing with CUDA OOM errors"

Claude's context at session start: Available skills (10 routers, ~400 tokens):

  • python-engineering (router)
  • pytorch-engineering (router)
  • react-engineering (router)
  • testing-workflows (router)
  • database-engineering (router) ... (5 more routers)

Step-by-step flow:

  1. Skill Discovery: Claude scans 10 router descriptions, immediately identifies pytorch-engineering router matches keywords: "PyTorch", "training", "CUDA"
  2. Router Invocation: Claude invokes pytorch-engineering router
  3. Router Logic: Router reads the error details and routes to the appropriate specialized skill: Symptoms: CUDA OOM, training loop → Loading pytorch-engineering:memory-profiling
  4. On-Demand Loading: The hidden skill pytorch-engineering:memory-profiling loads its full content (0 tokens → ~2,000 tokens)
  5. Problem Solving: Specialized skill provides deep diagnostic steps: - Check batch size vs. available VRAM - Inspect model parameter count - Check for memory leaks (detached tensors, cached gradients) - Suggest gradient accumulation or mixed-precision training

Benefits in this scenario:

  • ✅ Fast skill discovery (10 vs. 112 skills to scan)
  • ✅ Correct routing logic applied (router triages the problem)
  • ✅ Efficient context usage (only loads relevant specialized skill)
  • ✅ Deep expertise when needed (full skill content available after routing)

Another Example: Multi-Domain Projects

Developer working on a full-stack ML application:

Session 1: "I need to optimize my FastAPI endpoints" → Sees python-engineering router → Routes to python-engineering:async-patterns → Context: ~400 (routers) + ~2,000 (one skill) = ~2,400 tokens

Session 2: "My React frontend is re-rendering too much" → Sees react-engineering router → Routes to react-engineering:performance-optimization → Context: ~400 (routers) + ~2,000 (one skill) = ~2,400 tokens

Session 3: "My PyTorch model won't fit on the GPU" → Sees pytorch-engineering router → Routes to pytorch-engineering:memory-profiling → Context: ~400 (routers) + ~2,000 (one skill) = ~2,400 tokens

Without hidden/router fields:

  • Each session starts with ~4,000 tokens of skill metadata
  • Random skill selection bypasses router logic
  • Can't install all three skill packs (context overflow)

With hidden/router fields:

  • Each session starts with ~400 tokens of router metadata
  • Reliable routing to specialized skills
  • All three skill packs installed and working efficiently

Additional Context

No response

tachyon-beep avatar Nov 05 '25 08:11 tachyon-beep

Suggestion: put only the "router skills" in skills, and the rest in a docs/ folder??

tino avatar Nov 06 '25 15:11 tino

Empirical Research Supporting This Issue

I conducted independent research that validates and quantifies the skill budget problem described here. Closing my duplicate issues (#13099, #13100) to consolidate discussion.

Key Findings

Metric Value
Empirical budget ~15,500-16,000 characters
Per-skill overhead ~109 characters (XML tags, name, location)
With 63 skills Only 42 visible (33% hidden)
With 92 skills Only 36 visible (60% hidden, per #13044)

The Math

Total per skill = description_length + 109 chars overhead
Budget fills at ~15,700 characters total
Description Length Skills That Fit
263 chars (typical) ~42 skills
150 chars ~60 skills
130 chars ~67 skills

Critical Clarification: Skills ≠ Tools

There's been some confusion linking this issue to #12836 (Tool Search Tool / defer_loading). This is incorrect.

TOOLS (API-level)           → defer_loading CAN help
SKILLS (available_skills)   → defer_loading DOES NOT apply

The Tool Search Tool beta solves context bloat for MCP tools, but it will NOT help with skill visibility. Skills require the solution proposed here (hidden: true, router: true).

Additional Suggestions

Beyond the excellent hidden/router proposal, consider:

  1. Warnings/visibility: Show which skills are hidden, warn when installing skills that exceed budget
  2. Documentation: Until this is fixed, document the ~16K limit so users can work around it
  3. Workaround for now: Users can compress descriptions to ≤130 chars to fit more skills

Full Research

Detailed investigation with methodology, evidence, and reproduction steps: https://gist.github.com/alexey-pelykh/faa3c304f731d6a962efc5fa2a43abe1

This independently verifies findings from #12782 (within 3% of their calculations).


+1 for hidden: true and router: true - this would properly solve the scalability problem for skill libraries.

alexey-pelykh avatar Dec 05 '25 08:12 alexey-pelykh

This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.

github-actions[bot] avatar Jan 05 '26 10:01 github-actions[bot]