Personal_AI_Infrastructure icon indicating copy to clipboard operation
Personal_AI_Infrastructure copied to clipboard

feat(voice): Add provider-agnostic TTS support

Open Steffen025 opened this issue 2 days ago • 0 comments

Summary

This PR adds three voice system improvements:

  1. Provider-agnostic TTS support - Choose between ElevenLabs and Google Cloud TTS
  2. Fix voice spam from parallel agents - Prevent Explore/Plan agents from triggering voice output
  3. Runtime volume control - Adjust volume without restarting the server

1. Provider-Agnostic TTS Support

Problem

The current voice system assumes ElevenLabs as the only TTS provider. Users who want to use Google Cloud TTS cannot easily switch providers.

Solution

  • Add TTS_PROVIDER environment variable detection in getVoiceId() function
  • Support GOOGLE_TTS_VOICE_ID for Google Cloud TTS configuration
  • Maintain full backward compatibility with ElevenLabs (default behavior unchanged)

Usage

ElevenLabs (default - no changes needed):

# Existing setup continues to work

Google Cloud TTS:

TTS_PROVIDER=google
GOOGLE_TTS_VOICE_ID=en-US-Chirp3-HD-Charon  # Optional, uses server default if empty

2. Fix Voice Spam from Parallel Agents

Problem

Generic fallback patterns in extractCompletionMessage() caught ANY completion message, including from Explore/Plan agents. This caused voice spam when running parallel research agents.

Solution

  1. Remove generic fallback patterns - voice output is now opt-in only
  2. Add silent tier check for native agents (Explore, Plan)
  3. Keep agent-specific patterns (🗣️ AgentName: and COMPLETED: [AGENT:type])

Voice Tier System

Tier Agent Type Voice Output
Silent Explore, Plan (native) ❌ No voice
Voiced Engineer, Architect, etc. ✅ Yes (via [AGENT:type])
Main Main agent (PAI) ✅ Yes (fallback remains)

3. Runtime Volume Control

Problem

Volume settings are read once at server startup and cached. Users cannot adjust volume without restarting the voice server.

Solution

  • getVolumeSetting() now reads from ~/.claude/VoiceServer/volume.json on each call
  • New CLI tool VoiceServer/Tools/voice-volume.ts for runtime volume adjustments
  • Changes take effect immediately (no server restart needed)
  • Priority: volume.json > voices.json default_volume > 0.5 (default)

Usage

bun run VoiceServer/Tools/voice-volume.ts           # Show current volume
bun run VoiceServer/Tools/voice-volume.ts 0.3       # Set to 30%
bun run VoiceServer/Tools/voice-volume.ts up        # +10%
bun run VoiceServer/Tools/voice-volume.ts down      # -10%
bun run VoiceServer/Tools/voice-volume.ts mute      # 0%

Changes

File Change
Packs/pai-hook-system/src/hooks/lib/identity.ts Provider-aware getVoiceId()
Releases/v2.3/.claude/hooks/lib/identity.ts Same update for release version
.env.example Document new TTS configuration options
Packs/pai-hook-system/src/hooks/AgentOutputCapture.hook.ts Remove generic patterns, add silent tier
Releases/v2.3/.claude/hooks/AgentOutputCapture.hook.ts Same fix for release version
Packs/pai-voice-system/src/VoiceServer/server.ts Fresh-read volume from config
Releases/v2.3/.claude/VoiceServer/server.ts Same update for release version
Packs/pai-voice-system/src/VoiceServer/Tools/voice-volume.ts NEW - Volume control CLI
Releases/v2.3/.claude/VoiceServer/Tools/voice-volume.ts NEW - Volume control CLI

Test plan

Provider-agnostic TTS

  • [ ] Existing ElevenLabs setups work without changes
  • [x] Setting TTS_PROVIDER=google switches to Google TTS
  • [ ] Voice notifications play correctly with both providers

Voice spam fix

  • [x] Explore agents don't trigger voice output
  • [x] Plan agents don't trigger voice output
  • [x] Engineer/Architect agents still get voice (via [AGENT:type] pattern)
  • [x] Main agent fallback still works

Volume control

  • [x] voice-volume.ts CLI shows current volume
  • [x] Volume changes take effect without server restart
  • [x] Priority chain works: volume.json > voices.json > default

🤖 Generated with Claude Code

Steffen025 avatar Jan 17 '26 12:01 Steffen025