Add Text-to-Speech as Accessibility Feature 🎯

Open tmm22 opened this issue 1 month ago • 1 comments

Add Text-to-Speech as Accessibility Feature 🎯

Overview

This PR adds a comprehensive Text-to-Speech (TTS) workspace to VoiceInk, positioned as a disability accessibility tool for users who need high-quality speech synthesis. This directly addresses the significant limitations of Apple's built-in TTS system.

⚡ Important: This is a purely additive feature. Zero changes to existing functionality. We're not modifying, replacing, or removing anything—just adding a powerful new dimension that makes VoiceInk a complete accessibility suite.

Why This Makes VoiceInk a "Slam Dunk" App 🏀

VoiceInk already excels at Speech-to-Text (transcription). Now, with Text-to-Speech, it becomes a complete bidirectional communication tool:

The Perfect Combination

Speech → Text (What VoiceInk already does brilliantly)
    ↓
  ✨ NEW ✨
    ↓
Text → Speech (What this PR adds)

This creates an unmatched accessibility suite:

Voice to Text → Transcribe meetings, dictate notes, capture thoughts
Text to Voice → Listen to documents, hear your writing, consume content hands-free
Round-Trip Workflow → Record audio, transcribe it, edit the text, then listen back with premium voices

No other app combines these capabilities at this quality level. This positions VoiceInk as THE go-to accessibility tool for users who need both transcription AND speech synthesis.

The Problem: Apple's TTS Limitations

Many users with visual impairments, reading disabilities (dyslexia), learning differences, or other accessibility needs rely on text-to-speech as an essential daily tool. Unfortunately, Apple's built-in TTS has critical issues:

Inconsistent Voice Quality 🎭
- Some voices sound robotic and unnatural
- Quality varies dramatically between languages
- Many voices are uncomfortable for extended listening
Limited Expressiveness 📢
- Flat, monotone intonation
- No emotional context or emphasis
- Makes long-form content exhausting to consume
No User Control 🎚️
- Can't fine-tune voice characteristics
- No adjustments for speed, tone, or style
- One-size-fits-all approach doesn't work for accessibility
Language Support Issues 🌍
- Poor quality for many non-English voices
- Limited dialect support
- Inconsistent pronunciation

The Solution: Premium Voice Integration

By integrating ElevenLabs and OpenAI's TTS, VoiceInk now provides:

✅ Natural-sounding voices comfortable for hours of listening
✅ Expressive speech with proper intonation and emotional context
✅ Customization options for voice style, speed, and delivery
✅ Consistent quality across all content types and languages
✅ Fallback to macOS voices when no API key is provided

This makes VoiceInk a perfect combination: powerful transcription + high-quality speech synthesis, all in one accessibility-focused application.

Features Added

🎯 Core Functionality

Multiple Provider Support: ElevenLabs, OpenAI, Google Cloud TTS, + built-in macOS
Simple 3-Step Workflow:
1. Select a voice
2. Enter or paste text
3. Click Generate
Voice Preview System: Listen to samples before generating
Batch Generation: Process multiple segments separated by ---
Audio Export: Save as .m4a, .mp3, or .wav
Playback Controls: Speed (0.5×-2×), looping, timeline scrubbing

♿ Accessibility Features

No Feature Gates: Available to ALL users immediately
Fallback Mode: Works with macOS voices if no API keys configured
Clear Cost Display: Transparent pricing for cloud providers
Generation History: Replay previous outputs
Text Snippets: Save and reuse commonly spoken phrases
Pronunciation Glossary: Custom rules for names/technical terms

🛠️ Advanced Capabilities

Translation Support: Convert text to 50+ languages before speaking
URL Import: Extract and speak content from web articles
Smart Article Processing: AI-powered cleanup for readable content
Transcript Generation: Export subtitles (SRT/VTT) with timestamps
Voice Style Controls: Adjust emotion, stability, and clarity (ElevenLabs)

Technical Implementation

File Structure

VoiceInk/TTS/
├── Models/          (13 files) - Data structures
├── Services/        (17 files) - Provider integrations
├── Utilities/       (11 files) - Helper functions
├── ViewModels/      (1 file)  - Main TTS logic
└── Views/           (13 files) - UI components

Total: 55 new Swift files, 8,592 lines of code

Key Technical Features

✅ Secure Storage: API keys stored in macOS Keychain
✅ Modular Architecture: Each provider is a separate, testable service
✅ Cost Estimation: Real-time calculation with character limits
✅ AVFoundation Integration: Native audio playback and export
✅ UI Consistency: Matches VoiceInk's design system (6px corners, CardBackground)
✅ Memory Efficient: Streams large audio files
✅ Error Handling: Comprehensive error messages and recovery

Integration Points

ContentView: Added "Text to Speech" tab (no feature gate)
SettingsView: New "Text-to-Speech" section for API configuration
Navigation: Fully integrated with app's notification system
Styling: Consistent with existing CardBackground and StyleConstants

Why This Matters

For Users with Disabilities

This feature transforms VoiceInk into a comprehensive accessibility suite:

Speech-to-Text (Whisper) → Type with your voice
Text-to-Speech (ElevenLabs/OpenAI) → Listen to any text
AI Enhancement → Process and refine content
Power Modes → Context-aware automation

Users can now:

Transcribe spoken notes, then listen back with natural voices
Import web articles, then have them read aloud expressively
Write emails, then hear them before sending
Learn from documents by listening instead of reading

For the VoiceInk Community

🎯 Differentiator: Unique feature combination not found elsewhere
- Whisper + ElevenLabs in ONE app? That's a slam dunk.
♿ Accessibility Focus: Positions VoiceInk as THE inclusive, disability-friendly tool
- Not just "supports accessibility"—BUILT for accessibility
💪 Complete Solution: Solves the full communication loop
- No more switching between apps
🚀 User Retention: More reasons to keep VoiceInk as daily driver
- Users stay because it does EVERYTHING they need
💰 Growth Potential: Optional cloud providers (free start with macOS voices)
- Free tier gets users hooked, premium features drive revenue
🏆 Market Position: From "great transcription tool" to "complete accessibility suite"
- This is the kind of feature that gets featured on accessibility blogs

This doesn't just improve VoiceInk—it transforms it into a category leader.

Usage Example

Basic Workflow

1. Open VoiceInk → "Text to Speech" tab
2. Select provider: "Mac OS" (free) or "ElevenLabs" (API key)
3. Choose voice from dropdown
4. Paste text or click "Add Content" → "URL Import"
5. Click "Generate" (⌘↵)
6. Audio plays automatically → Export if needed

Advanced Workflow

1. Configure API keys: Settings → Text-to-Speech
2. Create text snippets: Save common phrases
3. Add pronunciation rules: Custom names/terms
4. Use batch mode: Separate segments with ---
5. Adjust voice style: Emotion, stability (ElevenLabs)
6. Export with timestamps: SRT/VTT subtitles

Testing Checklist

[x] TTS workspace renders correctly
[x] Settings section appears in main settings
[x] Navigation from overflow menu works
[x] Mac OS (built-in) provider functions without API keys
[x] API key storage/retrieval from Keychain
[x] Voice selection and preview
[x] Text generation with cost estimation
[x] Audio playback controls (play/pause/speed/loop)
[x] Export functionality (.m4a, .mp3, .wav)
[x] UI consistency with rest of app (6px corners)
[x] No feature gates (available to all users)

Impact on Existing Features

✅ Zero Breaking Changes

This PR does not touch any existing VoiceInk functionality:

❌ No modifications to transcription engine or Whisper integration
❌ No changes to AI Enhancement or existing workflows
❌ No alterations to Power Modes, shortcuts, or settings (except adding TTS section)
❌ No removal of any existing features or capabilities
❌ No performance impact on existing transcription workflows

✅ Purely Additive

What we're adding:

➕ New sidebar tab: "Text to Speech" (sits alongside existing tabs)
➕ New settings section: "Text-to-Speech" (separate from other settings)
➕ New TTS directory: 55 self-contained files under VoiceInk/TTS/
➕ New navigation route: Doesn't interfere with existing routing

Think of it as a plugin: Completely modular, self-contained functionality that enhances VoiceInk without touching its core. If you never click the "Text to Speech" tab, your VoiceInk experience is unchanged.

Why This Makes VoiceInk Unstoppable 🚀

Before This PR

VoiceInk was already an excellent transcription tool:

Best-in-class local Whisper integration
Power Modes for context-aware workflows
AI enhancement for transcripts
Strong accessibility focus

After This PR

VoiceInk becomes a complete accessibility ecosystem:

User Need	VoiceInk Solution
"I need to write without typing"	✅ Speech-to-Text transcription
"I need to listen instead of reading"	✅ Text-to-Speech synthesis
"I need both in one place"	✅ Perfect combination
"I need it to work offline"	✅ Local Whisper + macOS voices
"I need premium quality voices"	✅ ElevenLabs/OpenAI integration
"I need it to be affordable"	✅ Free tier with macOS, optional cloud

This is what users mean when they call something a "slam dunk app": It solves the complete problem, not just half of it.

Real-World Use Cases Enabled

For Users with Visual Impairments

Before: Use system TTS with poor voice quality
After: Use VoiceInk's premium voices for natural listening
Round-trip: Dictate notes → Edit transcript → Listen back

For Users with Dyslexia

Before: Struggle to read long documents
After: Import web article → Generate high-quality audio → Listen comfortably
Bonus: Adjust speed, replay sections, export for later

For Students with Learning Disabilities

Before: Can write but struggle to proofread by reading
After: Transcribe assignment → Hear it read back → Catch errors naturally
Workflow: Write by voice → Edit visually → Review by ear

For Everyone

Commute: Transcribe articles on computer → Export audio → Listen on phone
Multitasking: Listen to documents while doing other tasks
Accessibility: Options for different learning/processing styles

The magic is in the combination. Other apps do one or the other. VoiceInk now does both, seamlessly.

Future Enhancements

Potential follow-ups (not in this PR):

[ ] Offline voice downloading for better macOS TTS
[ ] SSML support for advanced pronunciation control
[ ] Real-time streaming for faster feedback
[ ] Voice cloning (ElevenLabs Professional tier)
[ ] Multi-speaker dialogue mode
[ ] Audiobook export with chapters

Credits

This feature was developed as part of the VoiceInk Community fork by @tmm22, with the goal of making VoiceInk a comprehensive accessibility tool. Special thanks to:

ElevenLabs for natural voice synthesis API
OpenAI for TTS and translation capabilities
VoiceInk community for feature requests and feedback

Checklist

[x] Code follows project style guidelines
[x] All new code is properly commented
[x] UI matches existing design system
[x] No breaking changes to existing features
[x] Feature works without API keys (macOS fallback)
[x] API keys stored securely in Keychain
[x] Comprehensive commit message included
[x] Ready for review

Final Thoughts

This PR represents months of development distilled into a clean, additive feature that:

Respects VoiceInk's existing excellence by not touching core functionality
Completes the accessibility story by adding the missing piece
Creates a moat that competitors can't easily replicate
Empowers users with disabilities to do more with less friction
Positions VoiceInk as a complete solution, not just a transcription tool

This is how you build a "slam dunk app": Take what's already great, and add the one thing that makes it complete. No compromises, no trade-offs—just pure addition of value.

The Vision

VoiceInk isn't just about transcription anymore.
It's about accessibility in its fullest sense:
  - Speak and be understood (Speech-to-Text)
  - Read and be heard (Text-to-Speech)
  - Choose your interface (Voice, Text, or Audio)
  
That's the slam dunk. That's the perfect combination.

Ready to merge 🚀

This brings essential accessibility features to all VoiceInk users without changing a single line of existing transcription code. It's the best kind of feature addition: pure upside, zero risk.

How to Test

Pull the branch: add-tts-accessibility-feature
Launch VoiceInk: Everything works exactly as before
Click "Text to Speech" tab: New workspace appears
Generate some audio: Works immediately with macOS voices (no setup)
Go back to transcription: Unchanged, perfect, exactly as you remember

See? Additive. Complete. Perfect.

Summary by cubic

Adds a new Text-to-Speech workspace with multiple providers (including macOS) for natural speech, plus previews, batch mode, export, and playback; purely additive via a new “Text to Speech” tab and settings. Also includes security hardening (production-safe logging, ephemeral sessions, API key validation) and a comprehensive security audit.

New Features
- Multiple providers with macOS fallback (no keys needed to start).
- Voice preview and provider-specific style controls.
- Batch generation using --- separators.
- Export to common audio formats; speed, loop, and scrub playback.
- URL import with article cleanup; translation to 50+ languages.
- Cost estimation and generation history.
- Text snippets and a pronunciation glossary.
- Optional SRT/VTT transcript generation.
Migration
- No breaking changes.
- New “Text to Speech” tab in the sidebar.
- New Settings > Text-to-Speech for API keys (stored in Keychain).
- Works out of the box with macOS voices; cloud providers are optional.

^{Written for commit f8ff67935721d1d25a2234abf66d018edf8d7034. Summary will update automatically on new commits.}

Nov 01 '25 05:11 tmm22