Personal_AI_Infrastructure
Personal_AI_Infrastructure copied to clipboard
feat(voice): Add provider-agnostic TTS support
Summary
This PR adds three voice system improvements:
- Provider-agnostic TTS support - Choose between ElevenLabs and Google Cloud TTS
- Fix voice spam from parallel agents - Prevent Explore/Plan agents from triggering voice output
- Runtime volume control - Adjust volume without restarting the server
1. Provider-Agnostic TTS Support
Problem
The current voice system assumes ElevenLabs as the only TTS provider. Users who want to use Google Cloud TTS cannot easily switch providers.
Solution
- Add
TTS_PROVIDERenvironment variable detection ingetVoiceId()function - Support
GOOGLE_TTS_VOICE_IDfor Google Cloud TTS configuration - Maintain full backward compatibility with ElevenLabs (default behavior unchanged)
Usage
ElevenLabs (default - no changes needed):
# Existing setup continues to work
Google Cloud TTS:
TTS_PROVIDER=google
GOOGLE_TTS_VOICE_ID=en-US-Chirp3-HD-Charon # Optional, uses server default if empty
2. Fix Voice Spam from Parallel Agents
Problem
Generic fallback patterns in extractCompletionMessage() caught ANY completion message, including from Explore/Plan agents. This caused voice spam when running parallel research agents.
Solution
- Remove generic fallback patterns - voice output is now opt-in only
- Add silent tier check for native agents (Explore, Plan)
- Keep agent-specific patterns (🗣️ AgentName: and COMPLETED: [AGENT:type])
Voice Tier System
| Tier | Agent Type | Voice Output |
|---|---|---|
| Silent | Explore, Plan (native) | ❌ No voice |
| Voiced | Engineer, Architect, etc. | ✅ Yes (via [AGENT:type]) |
| Main | Main agent (PAI) | ✅ Yes (fallback remains) |
3. Runtime Volume Control
Problem
Volume settings are read once at server startup and cached. Users cannot adjust volume without restarting the voice server.
Solution
-
getVolumeSetting()now reads from~/.claude/VoiceServer/volume.jsonon each call - New CLI tool
VoiceServer/Tools/voice-volume.tsfor runtime volume adjustments - Changes take effect immediately (no server restart needed)
- Priority: volume.json > voices.json default_volume > 0.5 (default)
Usage
bun run VoiceServer/Tools/voice-volume.ts # Show current volume
bun run VoiceServer/Tools/voice-volume.ts 0.3 # Set to 30%
bun run VoiceServer/Tools/voice-volume.ts up # +10%
bun run VoiceServer/Tools/voice-volume.ts down # -10%
bun run VoiceServer/Tools/voice-volume.ts mute # 0%
Changes
| File | Change |
|---|---|
Packs/pai-hook-system/src/hooks/lib/identity.ts |
Provider-aware getVoiceId() |
Releases/v2.3/.claude/hooks/lib/identity.ts |
Same update for release version |
.env.example |
Document new TTS configuration options |
Packs/pai-hook-system/src/hooks/AgentOutputCapture.hook.ts |
Remove generic patterns, add silent tier |
Releases/v2.3/.claude/hooks/AgentOutputCapture.hook.ts |
Same fix for release version |
Packs/pai-voice-system/src/VoiceServer/server.ts |
Fresh-read volume from config |
Releases/v2.3/.claude/VoiceServer/server.ts |
Same update for release version |
Packs/pai-voice-system/src/VoiceServer/Tools/voice-volume.ts |
NEW - Volume control CLI |
Releases/v2.3/.claude/VoiceServer/Tools/voice-volume.ts |
NEW - Volume control CLI |
Test plan
Provider-agnostic TTS
- [ ] Existing ElevenLabs setups work without changes
- [x] Setting
TTS_PROVIDER=googleswitches to Google TTS - [ ] Voice notifications play correctly with both providers
Voice spam fix
- [x] Explore agents don't trigger voice output
- [x] Plan agents don't trigger voice output
- [x] Engineer/Architect agents still get voice (via [AGENT:type] pattern)
- [x] Main agent fallback still works
Volume control
- [x]
voice-volume.tsCLI shows current volume - [x] Volume changes take effect without server restart
- [x] Priority chain works: volume.json > voices.json > default
🤖 Generated with Claude Code