Personal_AI_Infrastructure icon indicating copy to clipboard operation
Personal_AI_Infrastructure copied to clipboard

feat(agents,voice): Unified Voice System - Personality-Driven TTS with Dynamic Agent Routing

Open sti0 opened this issue 2 months ago • 1 comments

feat: Unified Voice System - Personality-Driven TTS with Dynamic Agent Routing

Summary

This PR consolidates voice configuration into a single source of truth architecture, enabling personality-specific text-to-speech for dynamically composed agents. Each agent now outputs [AGENT:voicename] tags that the hook system extracts for voice routing, delivering distinct vocal personalities based on agent traits.

The Problem

Previously, voice configuration was scattered across multiple files:

  • Traits.yaml contained a voice_registry with hardcoded voice IDs
  • AgentPersonalities.md duplicated voice settings in JSON blocks
  • Hooks resolved voice IDs locally, bypassing server-side flexibility
  • No consistent linking between agent personality and voice output

The Solution

Architecture: Voice NAME as the linking key

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Agent Output   │────▶│   Hook System    │────▶│  Voice Server   │
│ [AGENT:academic]│     │ Extracts tag,    │     │ Resolves from:  │
│                 │     │ sends voice NAME │     │ • ENV vars (ID) │
└─────────────────┘     └──────────────────┘     │ • JSON (settings)│
                                                  └─────────────────┘

Key Changes

Version bump

Bumped PACK version to v1.4.0 and added a changelog.

Agent Output Format

Dynamic agents now include voice routing in their COMPLETED output:

🎯 **COMPLETED**: [AGENT:academic] Found 15 security vulnerabilities

Configuration Consolidation

What Where Purpose
Voice IDs $PAI_DIR/.env ELEVENLABS_VOICE_ACADEMIC=<id>
Voice Settings voice-personalities.json stability, similarity_boost
Voice Mappings Traits.yaml trait → voice name

Technical Improvements

  • Handlebars {{lowercase}} helper - Ensures consistent voice name casing
  • Case-insensitive voice lookup - ACADEMIC, Academic, academic all resolve
  • Smart .env resolution - Server auto-discovers .env from script location
  • Hook passthrough - Sends voice NAME, server resolves ID from ENV

Files Changed

File Pack Change
Traits.yaml kai-agents-skill Removed voice_registry, lowercase voice names
AgentFactory.ts kai-agents-skill Added {{lowercase}} Handlebars helper
DynamicAgent.hbs kai-agents-skill Added [AGENT:xxx] output format
AgentPersonalities.md kai-agents-skill Added COMPLETED format, removed JSON block
voice-personalities.json kai-voice-system Added all 14 voice personalities
server.ts kai-voice-system Case-insensitive lookup + ENV resolution
subagent-stop-hook-voice.ts kai-voice-system Pass voice NAME instead of resolved ID
INSTALL.md kai-agents-skill Updated Step 4 with ENV var documentation

Voice Personalities

professional • authoritative • academic • warm • gentle
energetic • dynamic • sophisticated • intense • gritty
intern • architect • engineer • pai

Bug Fixes

  • Regex pattern fix: Corrected colon position for **COMPLETED**: detection (colon outside asterisks)
  • Direct text detection: Subagent hook now finds COMPLETED in direct assistant responses, not just Task tool results

Testing Checklist

  • [x] Dynamic agent outputs 🎯 **COMPLETED**: [AGENT:voicename] message
  • [x] Named agent (Intern) outputs 🎯 **COMPLETED**: [AGENT:intern] message
  • [x] Voice server extracts voice name correctly (case-insensitive)
  • [x] ENV var lookup works: ELEVENLABS_VOICE_ACADEMIC
  • [x] voice-personalities.json lookup works: voices["academic"]
  • [x] Fallback to ELEVENLABS_VOICE_DEFAULT works
  • [x] Academic voice plays correctly on agent completion

Breaking Changes

None. Existing agents continue to work. New [AGENT:xxx] tags are additive.

Migration

Add voice IDs to $PAI_DIR/.env:

ELEVENLABS_VOICE_ACADEMIC=<your_voice_id>
ELEVENLABS_VOICE_PROFESSIONAL=<your_voice_id>
# ... etc

🤖 Generated with Claude Code

sti0 avatar Jan 05 '26 00:01 sti0

Thank you @sti0 for this unified voice system work! 🙏

This is ambitious and valuable. PAI v2.1 restructured the codebase (kai-*pai-*), which affects the paths here. Your vision for personality-driven TTS is great - would love to see this revisited against the new structure!

See the release: https://github.com/danielmiessler/PAI/releases/tag/v2.1.0

danielmiessler avatar Jan 08 '26 12:01 danielmiessler