Add Text-to-Speech as Accessibility Feature π―
Add Text-to-Speech as Accessibility Feature π―
Overview
This PR adds a comprehensive Text-to-Speech (TTS) workspace to VoiceInk, positioned as a disability accessibility tool for users who need high-quality speech synthesis. This directly addresses the significant limitations of Apple's built-in TTS system.
β‘ Important: This is a purely additive feature. Zero changes to existing functionality. We're not modifying, replacing, or removing anythingβjust adding a powerful new dimension that makes VoiceInk a complete accessibility suite.
Why This Makes VoiceInk a "Slam Dunk" App π
VoiceInk already excels at Speech-to-Text (transcription). Now, with Text-to-Speech, it becomes a complete bidirectional communication tool:
The Perfect Combination
Speech β Text (What VoiceInk already does brilliantly)
β
β¨ NEW β¨
β
Text β Speech (What this PR adds)
This creates an unmatched accessibility suite:
- Voice to Text β Transcribe meetings, dictate notes, capture thoughts
- Text to Voice β Listen to documents, hear your writing, consume content hands-free
- Round-Trip Workflow β Record audio, transcribe it, edit the text, then listen back with premium voices
No other app combines these capabilities at this quality level. This positions VoiceInk as THE go-to accessibility tool for users who need both transcription AND speech synthesis.
The Problem: Apple's TTS Limitations
Many users with visual impairments, reading disabilities (dyslexia), learning differences, or other accessibility needs rely on text-to-speech as an essential daily tool. Unfortunately, Apple's built-in TTS has critical issues:
-
Inconsistent Voice Quality π
- Some voices sound robotic and unnatural
- Quality varies dramatically between languages
- Many voices are uncomfortable for extended listening
-
Limited Expressiveness π’
- Flat, monotone intonation
- No emotional context or emphasis
- Makes long-form content exhausting to consume
-
No User Control ποΈ
- Can't fine-tune voice characteristics
- No adjustments for speed, tone, or style
- One-size-fits-all approach doesn't work for accessibility
-
Language Support Issues π
- Poor quality for many non-English voices
- Limited dialect support
- Inconsistent pronunciation
The Solution: Premium Voice Integration
By integrating ElevenLabs and OpenAI's TTS, VoiceInk now provides:
- β Natural-sounding voices comfortable for hours of listening
- β Expressive speech with proper intonation and emotional context
- β Customization options for voice style, speed, and delivery
- β Consistent quality across all content types and languages
- β Fallback to macOS voices when no API key is provided
This makes VoiceInk a perfect combination: powerful transcription + high-quality speech synthesis, all in one accessibility-focused application.
Features Added
π― Core Functionality
- Multiple Provider Support: ElevenLabs, OpenAI, Google Cloud TTS, + built-in macOS
- Simple 3-Step Workflow:
- Select a voice
- Enter or paste text
- Click Generate
- Voice Preview System: Listen to samples before generating
- Batch Generation: Process multiple segments separated by
--- - Audio Export: Save as
.m4a,.mp3, or.wav - Playback Controls: Speed (0.5Γ-2Γ), looping, timeline scrubbing
βΏ Accessibility Features
- No Feature Gates: Available to ALL users immediately
- Fallback Mode: Works with macOS voices if no API keys configured
- Clear Cost Display: Transparent pricing for cloud providers
- Generation History: Replay previous outputs
- Text Snippets: Save and reuse commonly spoken phrases
- Pronunciation Glossary: Custom rules for names/technical terms
π οΈ Advanced Capabilities
- Translation Support: Convert text to 50+ languages before speaking
- URL Import: Extract and speak content from web articles
- Smart Article Processing: AI-powered cleanup for readable content
- Transcript Generation: Export subtitles (SRT/VTT) with timestamps
- Voice Style Controls: Adjust emotion, stability, and clarity (ElevenLabs)
Technical Implementation
File Structure
VoiceInk/TTS/
βββ Models/ (13 files) - Data structures
βββ Services/ (17 files) - Provider integrations
βββ Utilities/ (11 files) - Helper functions
βββ ViewModels/ (1 file) - Main TTS logic
βββ Views/ (13 files) - UI components
Total: 55 new Swift files, 8,592 lines of code
Key Technical Features
- β Secure Storage: API keys stored in macOS Keychain
- β Modular Architecture: Each provider is a separate, testable service
- β Cost Estimation: Real-time calculation with character limits
- β AVFoundation Integration: Native audio playback and export
- β UI Consistency: Matches VoiceInk's design system (6px corners, CardBackground)
- β Memory Efficient: Streams large audio files
- β Error Handling: Comprehensive error messages and recovery
Integration Points
- ContentView: Added "Text to Speech" tab (no feature gate)
- SettingsView: New "Text-to-Speech" section for API configuration
- Navigation: Fully integrated with app's notification system
- Styling: Consistent with existing CardBackground and StyleConstants
Why This Matters
For Users with Disabilities
This feature transforms VoiceInk into a comprehensive accessibility suite:
- Speech-to-Text (Whisper) β Type with your voice
- Text-to-Speech (ElevenLabs/OpenAI) β Listen to any text
- AI Enhancement β Process and refine content
- Power Modes β Context-aware automation
Users can now:
- Transcribe spoken notes, then listen back with natural voices
- Import web articles, then have them read aloud expressively
- Write emails, then hear them before sending
- Learn from documents by listening instead of reading
For the VoiceInk Community
- π― Differentiator: Unique feature combination not found elsewhere
- Whisper + ElevenLabs in ONE app? That's a slam dunk.
- βΏ Accessibility Focus: Positions VoiceInk as THE inclusive, disability-friendly tool
- Not just "supports accessibility"βBUILT for accessibility
- πͺ Complete Solution: Solves the full communication loop
- No more switching between apps
- π User Retention: More reasons to keep VoiceInk as daily driver
- Users stay because it does EVERYTHING they need
- π° Growth Potential: Optional cloud providers (free start with macOS voices)
- Free tier gets users hooked, premium features drive revenue
- π Market Position: From "great transcription tool" to "complete accessibility suite"
- This is the kind of feature that gets featured on accessibility blogs
This doesn't just improve VoiceInkβit transforms it into a category leader.
Usage Example
Basic Workflow
1. Open VoiceInk β "Text to Speech" tab
2. Select provider: "Mac OS" (free) or "ElevenLabs" (API key)
3. Choose voice from dropdown
4. Paste text or click "Add Content" β "URL Import"
5. Click "Generate" (ββ΅)
6. Audio plays automatically β Export if needed
Advanced Workflow
1. Configure API keys: Settings β Text-to-Speech
2. Create text snippets: Save common phrases
3. Add pronunciation rules: Custom names/terms
4. Use batch mode: Separate segments with ---
5. Adjust voice style: Emotion, stability (ElevenLabs)
6. Export with timestamps: SRT/VTT subtitles
Testing Checklist
- [x] TTS workspace renders correctly
- [x] Settings section appears in main settings
- [x] Navigation from overflow menu works
- [x] Mac OS (built-in) provider functions without API keys
- [x] API key storage/retrieval from Keychain
- [x] Voice selection and preview
- [x] Text generation with cost estimation
- [x] Audio playback controls (play/pause/speed/loop)
- [x] Export functionality (.m4a, .mp3, .wav)
- [x] UI consistency with rest of app (6px corners)
- [x] No feature gates (available to all users)
Impact on Existing Features
β Zero Breaking Changes
This PR does not touch any existing VoiceInk functionality:
- β No modifications to transcription engine or Whisper integration
- β No changes to AI Enhancement or existing workflows
- β No alterations to Power Modes, shortcuts, or settings (except adding TTS section)
- β No removal of any existing features or capabilities
- β No performance impact on existing transcription workflows
β Purely Additive
What we're adding:
- β New sidebar tab: "Text to Speech" (sits alongside existing tabs)
- β New settings section: "Text-to-Speech" (separate from other settings)
- β New TTS directory: 55 self-contained files under
VoiceInk/TTS/ - β New navigation route: Doesn't interfere with existing routing
Think of it as a plugin: Completely modular, self-contained functionality that enhances VoiceInk without touching its core. If you never click the "Text to Speech" tab, your VoiceInk experience is unchanged.
Why This Makes VoiceInk Unstoppable π
Before This PR
VoiceInk was already an excellent transcription tool:
- Best-in-class local Whisper integration
- Power Modes for context-aware workflows
- AI enhancement for transcripts
- Strong accessibility focus
After This PR
VoiceInk becomes a complete accessibility ecosystem:
| User Need | VoiceInk Solution |
|---|---|
| "I need to write without typing" | β Speech-to-Text transcription |
| "I need to listen instead of reading" | β Text-to-Speech synthesis |
| "I need both in one place" | β Perfect combination |
| "I need it to work offline" | β Local Whisper + macOS voices |
| "I need premium quality voices" | β ElevenLabs/OpenAI integration |
| "I need it to be affordable" | β Free tier with macOS, optional cloud |
This is what users mean when they call something a "slam dunk app": It solves the complete problem, not just half of it.
Real-World Use Cases Enabled
For Users with Visual Impairments
- Before: Use system TTS with poor voice quality
- After: Use VoiceInk's premium voices for natural listening
- Round-trip: Dictate notes β Edit transcript β Listen back
For Users with Dyslexia
- Before: Struggle to read long documents
- After: Import web article β Generate high-quality audio β Listen comfortably
- Bonus: Adjust speed, replay sections, export for later
For Students with Learning Disabilities
- Before: Can write but struggle to proofread by reading
- After: Transcribe assignment β Hear it read back β Catch errors naturally
- Workflow: Write by voice β Edit visually β Review by ear
For Everyone
- Commute: Transcribe articles on computer β Export audio β Listen on phone
- Multitasking: Listen to documents while doing other tasks
- Accessibility: Options for different learning/processing styles
The magic is in the combination. Other apps do one or the other. VoiceInk now does both, seamlessly.
Future Enhancements
Potential follow-ups (not in this PR):
- [ ] Offline voice downloading for better macOS TTS
- [ ] SSML support for advanced pronunciation control
- [ ] Real-time streaming for faster feedback
- [ ] Voice cloning (ElevenLabs Professional tier)
- [ ] Multi-speaker dialogue mode
- [ ] Audiobook export with chapters
Credits
This feature was developed as part of the VoiceInk Community fork by @tmm22, with the goal of making VoiceInk a comprehensive accessibility tool. Special thanks to:
- ElevenLabs for natural voice synthesis API
- OpenAI for TTS and translation capabilities
- VoiceInk community for feature requests and feedback
Checklist
- [x] Code follows project style guidelines
- [x] All new code is properly commented
- [x] UI matches existing design system
- [x] No breaking changes to existing features
- [x] Feature works without API keys (macOS fallback)
- [x] API keys stored securely in Keychain
- [x] Comprehensive commit message included
- [x] Ready for review
Final Thoughts
This PR represents months of development distilled into a clean, additive feature that:
- Respects VoiceInk's existing excellence by not touching core functionality
- Completes the accessibility story by adding the missing piece
- Creates a moat that competitors can't easily replicate
- Empowers users with disabilities to do more with less friction
- Positions VoiceInk as a complete solution, not just a transcription tool
This is how you build a "slam dunk app": Take what's already great, and add the one thing that makes it complete. No compromises, no trade-offsβjust pure addition of value.
The Vision
VoiceInk isn't just about transcription anymore.
It's about accessibility in its fullest sense:
- Speak and be understood (Speech-to-Text)
- Read and be heard (Text-to-Speech)
- Choose your interface (Voice, Text, or Audio)
That's the slam dunk. That's the perfect combination.
Ready to merge π
This brings essential accessibility features to all VoiceInk users without changing a single line of existing transcription code. It's the best kind of feature addition: pure upside, zero risk.
How to Test
- Pull the branch:
add-tts-accessibility-feature - Launch VoiceInk: Everything works exactly as before
- Click "Text to Speech" tab: New workspace appears
- Generate some audio: Works immediately with macOS voices (no setup)
- Go back to transcription: Unchanged, perfect, exactly as you remember
See? Additive. Complete. Perfect.
Summary by cubic
Adds a new Text-to-Speech workspace with multiple providers (including macOS) for natural speech, plus previews, batch mode, export, and playback; purely additive via a new βText to Speechβ tab and settings. Also includes security hardening (production-safe logging, ephemeral sessions, API key validation) and a comprehensive security audit.
-
New Features
- Multiple providers with macOS fallback (no keys needed to start).
- Voice preview and provider-specific style controls.
- Batch generation using --- separators.
- Export to common audio formats; speed, loop, and scrub playback.
- URL import with article cleanup; translation to 50+ languages.
- Cost estimation and generation history.
- Text snippets and a pronunciation glossary.
- Optional SRT/VTT transcript generation.
-
Migration
- No breaking changes.
- New βText to Speechβ tab in the sidebar.
- New Settings > Text-to-Speech for API keys (stored in Keychain).
- Works out of the box with macOS voices; cloud providers are optional.
Written for commit f8ff67935721d1d25a2234abf66d018edf8d7034. Summary will update automatically on new commits.