Add High-Quality Text-to-Speech for Accessibility
Add High-Quality Text-to-Speech for Accessibility
Problem Statement
Users with disabilities who rely on text-to-speech (TTS) face significant barriers when using macOS:
Apple's Built-in TTS Falls Short for Accessibility Needs
Critical Issues:
-
Unusable for Long Documents π
- Robotic, monotone voices become exhausting after 5-10 minutes
- Poor intonation makes comprehension difficult
- No emotional context or natural pacing
- Users with visual impairments or dyslexia need to listen for hours daily
-
Inconsistent Voice Quality π
- Some voices are barely intelligible
- Quality varies dramatically between languages
- Many accents/dialects have poor representation
- No control over voice characteristics
-
Limited Customization ποΈ
- Can't adjust emotional tone, stability, or clarity
- Speed control is basic (no fine-tuning)
- No voice style options for different content types
- One-size-fits-all doesn't work for accessibility
-
Poor Language Support π
- Non-English voices often sound worse
- Limited dialect variations
- Pronunciation issues with technical terms
- No customization for names or specialized vocabulary
The Impact on Disabled Users
Real-world consequences:
- Students with dyslexia struggle to consume course materials
- Users with visual impairments face fatigue from poor voice quality
- People with ADHD can't focus due to monotone delivery
- Non-native speakers can't rely on TTS for learning
- Professionals can't use TTS for work documents (too exhausting)
Current "solutions" don't work:
- Web-based TTS services: Require copying/pasting, no offline access, poor UX
- Separate Mac apps: Force users to juggle multiple tools, break workflows
- Mobile TTS apps: Not practical for desktop work, small screens
- Browser extensions: Limited to web content, inconsistent quality
Why macOS Needs Better Native TTS Support
The gap:
- macOS has excellent Speech-to-Text (dictation)
- macOS has terrible Speech synthesis for accessibility
- No native solution bridges this gap
What users need:
- Natural-sounding voices comfortable for extended listening
- Integration with existing workflows
- Offline capability with online premium options
- Control over voice characteristics and pacing
- Support for custom pronunciation and terminology
Why VoiceInk is the Perfect Solution
VoiceInk already excels at Speech-to-Text (transcription via Whisper). Adding Text-to-Speech creates a complete accessibility suite that no other Mac app provides:
The Perfect Combination π―
Speech β Text (VoiceInk's existing strength)
β
β¨ NEW β¨
β
Text β Speech (What this feature adds)
Bidirectional communication:
- Voice to Text β Transcribe meetings, dictate notes, capture thoughts
- Text to Voice β Listen to documents, hear your writing, consume content
- Round-Trip Workflows β Record audio, transcribe, edit text, listen back with premium voices
No other Mac app does this. VoiceInk would become THE go-to accessibility tool.
Why This Belongs in VoiceInk
Natural fit:
- β Already accessibility-focused - VoiceInk's mission aligns with disability support
- β Complements existing features - Completes the "voice interface" story
- β Same user base - People who need STT often need TTS
- β Unified workflow - One app for all voice/text needs
- β Local-first - Matches VoiceInk's privacy focus (offline option)
Strategic positioning:
- π Market differentiation - Unique feature combination
- πͺ Category leader - From "great transcription" to "complete accessibility suite"
- π Competitive moat - Hard for competitors to replicate this combo
- π° Value proposition - Justifies premium pricing with premium voices
Proposed Solution
Integrate high-quality Text-to-Speech using premium providers (ElevenLabs, OpenAI) with a free fallback to macOS voices.
Key Features
Core Functionality:
- Multiple provider support: ElevenLabs, OpenAI, Google Cloud TTS, + built-in macOS
- Simple workflow: Select voice β Enter text β Generate
- Voice preview system before generating
- Batch generation for long documents
- Audio export (.m4a, .mp3, .wav)
- Playback controls: Speed (0.5Γ-2Γ), looping, timeline scrubbing
Accessibility Features:
- Works immediately with macOS voices (no API keys required)
- Natural voices from ElevenLabs/OpenAI for comfort during long listening
- Voice style controls (emotion, stability, clarity)
- Pronunciation glossary for custom terms
- Translation support (50+ languages)
- Speed adjustment for different comprehension needs
Advanced Capabilities:
- URL import: Extract and speak web articles
- Text snippets library: Save commonly used phrases
- Transcript generation: Export subtitles with timestamps
- Cost estimation: Transparent pricing for cloud providers
- Generation history: Replay previous outputs
Why Premium Voices Matter for Accessibility
ElevenLabs/OpenAI vs Apple TTS:
| Feature | Apple TTS | Premium TTS (ElevenLabs/OpenAI) |
|---|---|---|
| Naturalness | β Robotic | β Human-like |
| Long listening | β Exhausting (10 min) | β Comfortable (hours) |
| Emotional range | β Monotone | β Expressive |
| Customization | β Minimal | β Extensive |
| Pronunciation | β Poor | β Excellent |
| Multiple voices | β Limited | β Hundreds |
For accessibility, quality isn't a luxuryβit's essential.
Real-World Use Cases
For Users with Visual Impairments
- Current: Struggle with robotic macOS voices, fatigue quickly
- With VoiceInk: Use premium voices for natural listening experience
- Workflow: Import documents β Generate audio β Listen comfortably for hours
For Users with Dyslexia
- Current: Reading long documents is exhausting and error-prone
- With VoiceInk: Import web articles or PDFs β Listen with adjustable speed
- Workflow: Copy text β Paste in VoiceInk β Generate β Listen while doing other tasks
For Students with Learning Disabilities
- Current: Struggle to proofread written work by reading
- With VoiceInk: Transcribe assignments by voice β Listen back to catch errors
- Workflow: Dictate essay β Edit transcript β Hear it read back β Submit confidently
For Professionals Who Need Both
- Current: Use separate apps for dictation and TTS (clunky)
- With VoiceInk: One app for all voice/text needs
- Workflow:
- Transcribe meeting notes
- Edit and clean up transcript
- Generate audio summary
- Share both text and audio with team
Why This is Urgent
The accessibility community needs this now:
-
Existing solutions are inadequate
- Apple TTS is painful for long-form content
- Web services are fragmented and require internet
- No unified Mac solution exists
-
Remote work/education increased TTS demand
- More digital content to consume
- More long-form documents
- More need for multimodal accessibility
-
AI voice quality is finally good enough
- ElevenLabs and OpenAI TTS are production-ready
- Natural enough for daily use
- Affordable for individual users
-
VoiceInk has the infrastructure
- Already handles audio processing
- Already has API integration patterns
- Already has the right user base
Implementation Available
A full implementation of this feature is ready for review:
Pull Request #354: Add Text-to-Speech as Accessibility Feature
What's included:
- β 55 new Swift files (8,592 lines of code)
- β Complete TTS workspace with modern UI
- β Support for ElevenLabs, OpenAI, Google Cloud TTS, macOS voices
- β Settings integration with secure API key storage
- β Batch processing, audio export, playback controls
- β Zero breaking changes (purely additive)
- β Consistent with VoiceInk's design system
- β Well-documented and tested
The work is done. This issue is to discuss whether to merge it.
Discussion Points
For the VoiceInk community to consider:
-
Does this align with VoiceInk's mission?
- Is completing the voice/text accessibility loop worthwhile?
- Should VoiceInk be "just transcription" or a "complete accessibility suite"?
-
Is the implementation acceptable?
- Review PR #354 for code quality, architecture, UX
- Are there concerns about maintenance burden?
- Does it integrate well with existing features?
-
What about scope creep?
- Is this "feature bloat" or natural evolution?
- Does it enhance or distract from core transcription?
- How do users feel about TTS in a transcription app?
-
Accessibility priority?
- How important is it to serve users with disabilities?
- Is solving the "Apple TTS problem" valuable to the community?
- Would this make VoiceInk more inclusive?
Alternatives Considered
Why not just tell users to use other TTS apps?
- Workflow fragmentation - Forces context switching between apps
- No integration - Can't leverage VoiceInk's existing transcripts
- Poor UX - Separate apps don't understand each other
- Cost - Premium TTS apps cost $20-50/month separately
Why not just improve Apple's TTS?
- Not in our control - We can't fix Apple's voices
- Slow progress - Apple TTS hasn't improved significantly in years
- Immediate solution - Premium APIs available now
Why not just wait for Apple to fix it?
- No indication they will - Apple TTS has been mediocre for a decade
- Users need help now - People with disabilities can't wait
- Competitive advantage - VoiceInk can lead where Apple lags
Call to Action
For maintainers: Please review PR #354 and consider merging this accessibility-focused feature.
For users: If you need better TTS on Mac, please comment with your use case. Your voice matters.
For accessibility advocates: Share this issue with communities who would benefit from better Mac TTS.
Related Links
- Pull Request: #354 - Full implementation ready for review
- ElevenLabs TTS: https://elevenlabs.io - Example of premium voice quality
- OpenAI TTS: https://platform.openai.com/docs/guides/text-to-speech - Alternative provider
- WebAIM on TTS: https://webaim.org/articles/visual/ - Accessibility perspective
TL;DR: Apple's TTS is inadequate for users with disabilities who need to listen to long documents. VoiceInk can solve this by integrating premium TTS providers, creating a complete accessibility suite. Implementation is ready in PR #354. This would make VoiceInk the go-to Mac app for accessibility.