feat(voice): Add provider-agnostic TTS support

Open Steffen025 opened this issue 2 days ago • 0 comments

Summary

This PR adds three voice system improvements:

Provider-agnostic TTS support - Choose between ElevenLabs and Google Cloud TTS
Fix voice spam from parallel agents - Prevent Explore/Plan agents from triggering voice output
Runtime volume control - Adjust volume without restarting the server

1. Provider-Agnostic TTS Support

Problem

The current voice system assumes ElevenLabs as the only TTS provider. Users who want to use Google Cloud TTS cannot easily switch providers.

Solution

Add TTS_PROVIDER environment variable detection in getVoiceId() function
Support GOOGLE_TTS_VOICE_ID for Google Cloud TTS configuration
Maintain full backward compatibility with ElevenLabs (default behavior unchanged)

Usage

ElevenLabs (default - no changes needed):

# Existing setup continues to work

Google Cloud TTS:

TTS_PROVIDER=google
GOOGLE_TTS_VOICE_ID=en-US-Chirp3-HD-Charon  # Optional, uses server default if empty

2. Fix Voice Spam from Parallel Agents

Problem

Generic fallback patterns in extractCompletionMessage() caught ANY completion message, including from Explore/Plan agents. This caused voice spam when running parallel research agents.

Solution

Remove generic fallback patterns - voice output is now opt-in only
Add silent tier check for native agents (Explore, Plan)
Keep agent-specific patterns (🗣️ AgentName: and COMPLETED: [AGENT:type])

Voice Tier System

Tier	Agent Type	Voice Output
Silent	Explore, Plan (native)	❌ No voice
Voiced	Engineer, Architect, etc.	✅ Yes (via [AGENT:type])
Main	Main agent (PAI)	✅ Yes (fallback remains)

3. Runtime Volume Control

Problem

Volume settings are read once at server startup and cached. Users cannot adjust volume without restarting the voice server.

Solution

getVolumeSetting() now reads from ~/.claude/VoiceServer/volume.json on each call
New CLI tool VoiceServer/Tools/voice-volume.ts for runtime volume adjustments
Changes take effect immediately (no server restart needed)
Priority: volume.json > voices.json default_volume > 0.5 (default)

Usage

bun run VoiceServer/Tools/voice-volume.ts           # Show current volume
bun run VoiceServer/Tools/voice-volume.ts 0.3       # Set to 30%
bun run VoiceServer/Tools/voice-volume.ts up        # +10%
bun run VoiceServer/Tools/voice-volume.ts down      # -10%
bun run VoiceServer/Tools/voice-volume.ts mute      # 0%

Changes

File	Change
`Packs/pai-hook-system/src/hooks/lib/identity.ts`	Provider-aware `getVoiceId()`
`Releases/v2.3/.claude/hooks/lib/identity.ts`	Same update for release version
`.env.example`	Document new TTS configuration options
`Packs/pai-hook-system/src/hooks/AgentOutputCapture.hook.ts`	Remove generic patterns, add silent tier
`Releases/v2.3/.claude/hooks/AgentOutputCapture.hook.ts`	Same fix for release version
`Packs/pai-voice-system/src/VoiceServer/server.ts`	Fresh-read volume from config
`Releases/v2.3/.claude/VoiceServer/server.ts`	Same update for release version
`Packs/pai-voice-system/src/VoiceServer/Tools/voice-volume.ts`	NEW - Volume control CLI
`Releases/v2.3/.claude/VoiceServer/Tools/voice-volume.ts`	NEW - Volume control CLI

Test plan

Provider-agnostic TTS

[ ] Existing ElevenLabs setups work without changes
[x] Setting TTS_PROVIDER=google switches to Google TTS
[ ] Voice notifications play correctly with both providers

Voice spam fix

[x] Explore agents don't trigger voice output
[x] Plan agents don't trigger voice output
[x] Engineer/Architect agents still get voice (via [AGENT:type] pattern)
[x] Main agent fallback still works

Volume control

[x] voice-volume.ts CLI shows current volume
[x] Volume changes take effect without server restart
[x] Priority chain works: volume.json > voices.json > default

🤖 Generated with Claude Code

Jan 17 '26 12:01 Steffen025