feat(agents,voice): Unified Voice System - Personality-Driven TTS with Dynamic Agent Routing
feat: Unified Voice System - Personality-Driven TTS with Dynamic Agent Routing
Summary
This PR consolidates voice configuration into a single source of truth architecture, enabling personality-specific text-to-speech for dynamically composed agents. Each agent now outputs [AGENT:voicename] tags that the hook system extracts for voice routing, delivering distinct vocal personalities based on agent traits.
The Problem
Previously, voice configuration was scattered across multiple files:
-
Traits.yamlcontained avoice_registrywith hardcoded voice IDs -
AgentPersonalities.mdduplicated voice settings in JSON blocks - Hooks resolved voice IDs locally, bypassing server-side flexibility
- No consistent linking between agent personality and voice output
The Solution
Architecture: Voice NAME as the linking key
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Agent Output │────▶│ Hook System │────▶│ Voice Server │
│ [AGENT:academic]│ │ Extracts tag, │ │ Resolves from: │
│ │ │ sends voice NAME │ │ • ENV vars (ID) │
└─────────────────┘ └──────────────────┘ │ • JSON (settings)│
└─────────────────┘
Key Changes
Version bump
Bumped PACK version to v1.4.0 and added a changelog.
Agent Output Format
Dynamic agents now include voice routing in their COMPLETED output:
🎯 **COMPLETED**: [AGENT:academic] Found 15 security vulnerabilities
Configuration Consolidation
| What | Where | Purpose |
|---|---|---|
| Voice IDs | $PAI_DIR/.env |
ELEVENLABS_VOICE_ACADEMIC=<id> |
| Voice Settings | voice-personalities.json |
stability, similarity_boost |
| Voice Mappings | Traits.yaml |
trait → voice name |
Technical Improvements
-
Handlebars
{{lowercase}}helper - Ensures consistent voice name casing -
Case-insensitive voice lookup -
ACADEMIC,Academic,academicall resolve - Smart .env resolution - Server auto-discovers .env from script location
- Hook passthrough - Sends voice NAME, server resolves ID from ENV
Files Changed
| File | Pack | Change |
|---|---|---|
Traits.yaml |
kai-agents-skill | Removed voice_registry, lowercase voice names |
AgentFactory.ts |
kai-agents-skill | Added {{lowercase}} Handlebars helper |
DynamicAgent.hbs |
kai-agents-skill | Added [AGENT:xxx] output format |
AgentPersonalities.md |
kai-agents-skill | Added COMPLETED format, removed JSON block |
voice-personalities.json |
kai-voice-system | Added all 14 voice personalities |
server.ts |
kai-voice-system | Case-insensitive lookup + ENV resolution |
subagent-stop-hook-voice.ts |
kai-voice-system | Pass voice NAME instead of resolved ID |
INSTALL.md |
kai-agents-skill | Updated Step 4 with ENV var documentation |
Voice Personalities
professional • authoritative • academic • warm • gentle
energetic • dynamic • sophisticated • intense • gritty
intern • architect • engineer • pai
Bug Fixes
-
Regex pattern fix: Corrected colon position for
**COMPLETED**:detection (colon outside asterisks) - Direct text detection: Subagent hook now finds COMPLETED in direct assistant responses, not just Task tool results
Testing Checklist
- [x] Dynamic agent outputs
🎯 **COMPLETED**: [AGENT:voicename] message - [x] Named agent (Intern) outputs
🎯 **COMPLETED**: [AGENT:intern] message - [x] Voice server extracts voice name correctly (case-insensitive)
- [x] ENV var lookup works:
ELEVENLABS_VOICE_ACADEMIC - [x] voice-personalities.json lookup works:
voices["academic"] - [x] Fallback to
ELEVENLABS_VOICE_DEFAULTworks - [x] Academic voice plays correctly on agent completion
Breaking Changes
None. Existing agents continue to work. New [AGENT:xxx] tags are additive.
Migration
Add voice IDs to $PAI_DIR/.env:
ELEVENLABS_VOICE_ACADEMIC=<your_voice_id>
ELEVENLABS_VOICE_PROFESSIONAL=<your_voice_id>
# ... etc
🤖 Generated with Claude Code
Additional analytics, design and implementation documents:
2026-01-05-unified-voice-system-analytics.md 2026-01-05-unified-voice-system-design.md 2026-01-05-unified-voice-system-implement.md
Thank you @sti0 for this unified voice system work! 🙏
This is ambitious and valuable. PAI v2.1 restructured the codebase (kai-* → pai-*), which affects the paths here. Your vision for personality-driven TTS is great - would love to see this revisited against the new structure!
See the release: https://github.com/danielmiessler/PAI/releases/tag/v2.1.0