VoiceInk icon indicating copy to clipboard operation
VoiceInk copied to clipboard

Add High-Quality Text-to-Speech for Accessibility

Open tmm22 opened this issue 1 month ago β€’ 0 comments

Add High-Quality Text-to-Speech for Accessibility

Problem Statement

Users with disabilities who rely on text-to-speech (TTS) face significant barriers when using macOS:

Apple's Built-in TTS Falls Short for Accessibility Needs

Critical Issues:

  1. Unusable for Long Documents πŸ“š

    • Robotic, monotone voices become exhausting after 5-10 minutes
    • Poor intonation makes comprehension difficult
    • No emotional context or natural pacing
    • Users with visual impairments or dyslexia need to listen for hours daily
  2. Inconsistent Voice Quality 🎭

    • Some voices are barely intelligible
    • Quality varies dramatically between languages
    • Many accents/dialects have poor representation
    • No control over voice characteristics
  3. Limited Customization 🎚️

    • Can't adjust emotional tone, stability, or clarity
    • Speed control is basic (no fine-tuning)
    • No voice style options for different content types
    • One-size-fits-all doesn't work for accessibility
  4. Poor Language Support 🌍

    • Non-English voices often sound worse
    • Limited dialect variations
    • Pronunciation issues with technical terms
    • No customization for names or specialized vocabulary

The Impact on Disabled Users

Real-world consequences:

  • Students with dyslexia struggle to consume course materials
  • Users with visual impairments face fatigue from poor voice quality
  • People with ADHD can't focus due to monotone delivery
  • Non-native speakers can't rely on TTS for learning
  • Professionals can't use TTS for work documents (too exhausting)

Current "solutions" don't work:

  • Web-based TTS services: Require copying/pasting, no offline access, poor UX
  • Separate Mac apps: Force users to juggle multiple tools, break workflows
  • Mobile TTS apps: Not practical for desktop work, small screens
  • Browser extensions: Limited to web content, inconsistent quality

Why macOS Needs Better Native TTS Support

The gap:

  • macOS has excellent Speech-to-Text (dictation)
  • macOS has terrible Speech synthesis for accessibility
  • No native solution bridges this gap

What users need:

  • Natural-sounding voices comfortable for extended listening
  • Integration with existing workflows
  • Offline capability with online premium options
  • Control over voice characteristics and pacing
  • Support for custom pronunciation and terminology

Why VoiceInk is the Perfect Solution

VoiceInk already excels at Speech-to-Text (transcription via Whisper). Adding Text-to-Speech creates a complete accessibility suite that no other Mac app provides:

The Perfect Combination 🎯

Speech β†’ Text (VoiceInk's existing strength)
    ↓
  ✨ NEW ✨
    ↓
Text β†’ Speech (What this feature adds)

Bidirectional communication:

  1. Voice to Text β†’ Transcribe meetings, dictate notes, capture thoughts
  2. Text to Voice β†’ Listen to documents, hear your writing, consume content
  3. Round-Trip Workflows β†’ Record audio, transcribe, edit text, listen back with premium voices

No other Mac app does this. VoiceInk would become THE go-to accessibility tool.

Why This Belongs in VoiceInk

Natural fit:

  • βœ… Already accessibility-focused - VoiceInk's mission aligns with disability support
  • βœ… Complements existing features - Completes the "voice interface" story
  • βœ… Same user base - People who need STT often need TTS
  • βœ… Unified workflow - One app for all voice/text needs
  • βœ… Local-first - Matches VoiceInk's privacy focus (offline option)

Strategic positioning:

  • πŸš€ Market differentiation - Unique feature combination
  • πŸ’ͺ Category leader - From "great transcription" to "complete accessibility suite"
  • πŸ† Competitive moat - Hard for competitors to replicate this combo
  • πŸ’° Value proposition - Justifies premium pricing with premium voices

Proposed Solution

Integrate high-quality Text-to-Speech using premium providers (ElevenLabs, OpenAI) with a free fallback to macOS voices.

Key Features

Core Functionality:

  • Multiple provider support: ElevenLabs, OpenAI, Google Cloud TTS, + built-in macOS
  • Simple workflow: Select voice β†’ Enter text β†’ Generate
  • Voice preview system before generating
  • Batch generation for long documents
  • Audio export (.m4a, .mp3, .wav)
  • Playback controls: Speed (0.5Γ—-2Γ—), looping, timeline scrubbing

Accessibility Features:

  • Works immediately with macOS voices (no API keys required)
  • Natural voices from ElevenLabs/OpenAI for comfort during long listening
  • Voice style controls (emotion, stability, clarity)
  • Pronunciation glossary for custom terms
  • Translation support (50+ languages)
  • Speed adjustment for different comprehension needs

Advanced Capabilities:

  • URL import: Extract and speak web articles
  • Text snippets library: Save commonly used phrases
  • Transcript generation: Export subtitles with timestamps
  • Cost estimation: Transparent pricing for cloud providers
  • Generation history: Replay previous outputs

Why Premium Voices Matter for Accessibility

ElevenLabs/OpenAI vs Apple TTS:

Feature Apple TTS Premium TTS (ElevenLabs/OpenAI)
Naturalness ❌ Robotic βœ… Human-like
Long listening ❌ Exhausting (10 min) βœ… Comfortable (hours)
Emotional range ❌ Monotone βœ… Expressive
Customization ❌ Minimal βœ… Extensive
Pronunciation ❌ Poor βœ… Excellent
Multiple voices ❌ Limited βœ… Hundreds

For accessibility, quality isn't a luxuryβ€”it's essential.


Real-World Use Cases

For Users with Visual Impairments

  • Current: Struggle with robotic macOS voices, fatigue quickly
  • With VoiceInk: Use premium voices for natural listening experience
  • Workflow: Import documents β†’ Generate audio β†’ Listen comfortably for hours

For Users with Dyslexia

  • Current: Reading long documents is exhausting and error-prone
  • With VoiceInk: Import web articles or PDFs β†’ Listen with adjustable speed
  • Workflow: Copy text β†’ Paste in VoiceInk β†’ Generate β†’ Listen while doing other tasks

For Students with Learning Disabilities

  • Current: Struggle to proofread written work by reading
  • With VoiceInk: Transcribe assignments by voice β†’ Listen back to catch errors
  • Workflow: Dictate essay β†’ Edit transcript β†’ Hear it read back β†’ Submit confidently

For Professionals Who Need Both

  • Current: Use separate apps for dictation and TTS (clunky)
  • With VoiceInk: One app for all voice/text needs
  • Workflow:
    • Transcribe meeting notes
    • Edit and clean up transcript
    • Generate audio summary
    • Share both text and audio with team

Why This is Urgent

The accessibility community needs this now:

  1. Existing solutions are inadequate

    • Apple TTS is painful for long-form content
    • Web services are fragmented and require internet
    • No unified Mac solution exists
  2. Remote work/education increased TTS demand

    • More digital content to consume
    • More long-form documents
    • More need for multimodal accessibility
  3. AI voice quality is finally good enough

    • ElevenLabs and OpenAI TTS are production-ready
    • Natural enough for daily use
    • Affordable for individual users
  4. VoiceInk has the infrastructure

    • Already handles audio processing
    • Already has API integration patterns
    • Already has the right user base

Implementation Available

A full implementation of this feature is ready for review:

Pull Request #354: Add Text-to-Speech as Accessibility Feature

What's included:

  • βœ… 55 new Swift files (8,592 lines of code)
  • βœ… Complete TTS workspace with modern UI
  • βœ… Support for ElevenLabs, OpenAI, Google Cloud TTS, macOS voices
  • βœ… Settings integration with secure API key storage
  • βœ… Batch processing, audio export, playback controls
  • βœ… Zero breaking changes (purely additive)
  • βœ… Consistent with VoiceInk's design system
  • βœ… Well-documented and tested

The work is done. This issue is to discuss whether to merge it.


Discussion Points

For the VoiceInk community to consider:

  1. Does this align with VoiceInk's mission?

    • Is completing the voice/text accessibility loop worthwhile?
    • Should VoiceInk be "just transcription" or a "complete accessibility suite"?
  2. Is the implementation acceptable?

    • Review PR #354 for code quality, architecture, UX
    • Are there concerns about maintenance burden?
    • Does it integrate well with existing features?
  3. What about scope creep?

    • Is this "feature bloat" or natural evolution?
    • Does it enhance or distract from core transcription?
    • How do users feel about TTS in a transcription app?
  4. Accessibility priority?

    • How important is it to serve users with disabilities?
    • Is solving the "Apple TTS problem" valuable to the community?
    • Would this make VoiceInk more inclusive?

Alternatives Considered

Why not just tell users to use other TTS apps?

  1. Workflow fragmentation - Forces context switching between apps
  2. No integration - Can't leverage VoiceInk's existing transcripts
  3. Poor UX - Separate apps don't understand each other
  4. Cost - Premium TTS apps cost $20-50/month separately

Why not just improve Apple's TTS?

  1. Not in our control - We can't fix Apple's voices
  2. Slow progress - Apple TTS hasn't improved significantly in years
  3. Immediate solution - Premium APIs available now

Why not just wait for Apple to fix it?

  1. No indication they will - Apple TTS has been mediocre for a decade
  2. Users need help now - People with disabilities can't wait
  3. Competitive advantage - VoiceInk can lead where Apple lags

Call to Action

For maintainers: Please review PR #354 and consider merging this accessibility-focused feature.

For users: If you need better TTS on Mac, please comment with your use case. Your voice matters.

For accessibility advocates: Share this issue with communities who would benefit from better Mac TTS.


Related Links

  • Pull Request: #354 - Full implementation ready for review
  • ElevenLabs TTS: https://elevenlabs.io - Example of premium voice quality
  • OpenAI TTS: https://platform.openai.com/docs/guides/text-to-speech - Alternative provider
  • WebAIM on TTS: https://webaim.org/articles/visual/ - Accessibility perspective

TL;DR: Apple's TTS is inadequate for users with disabilities who need to listen to long documents. VoiceInk can solve this by integrating premium TTS providers, creating a complete accessibility suite. Implementation is ready in PR #354. This would make VoiceInk the go-to Mac app for accessibility.

tmm22 avatar Nov 01 '25 07:11 tmm22