Feature Request: SenseVoice ASR Model and Improved Model Ranking
Overview
This issue documents the benefits of adding the SenseVoice ASR model and an improved model ranking system to VoiceInk.
Related PR: #410
Why SenseVoice?
Performance Benefits
| Metric | SenseVoice | Whisper Large v3 |
|---|---|---|
| Speed | 15x faster | Baseline |
| Accuracy | 96% | 97% |
| Model Size | 234 MB | 1.5 GB |
| RAM Usage | ~800 MB | ~3 GB |
SenseVoice delivers near-Whisper accuracy at a fraction of the computational cost, making it ideal for:
- Users with limited RAM or older hardware
- Scenarios requiring real-time transcription
- Battery-conscious laptop users
Language Support
SenseVoice excels at Asian languages that Whisper sometimes struggles with:
- Chinese (Mandarin) - Primary optimization target
- Cantonese - Often underserved by other models
- Japanese - Excellent kanji recognition
- Korean - Strong hangul support
- English - Competitive with Whisper
This makes VoiceInk more accessible to users in Asia-Pacific regions.
Why Improved Model Ranking?
The Problem
Currently, models are sorted in a fixed order that doesn't reflect their actual performance characteristics. Users must manually research which models offer the best balance of speed and accuracy.
The Solution
A dynamic ranking system using a geometric mean score:
score = sqrt(accuracy × speed) + bonus
This approach:
- Rewards balance - Models must excel at BOTH metrics to rank high
- Penalizes extremes - A model with 99% speed but 50% accuracy ranks lower than one with 80%/80%
- Applies bonus - High performers (accuracy >= 0.94 AND speed >= 0.75) get +0.1 boost
User Benefits
- Faster model selection - Best options appear first
- Informed decisions - Speed/accuracy ratings visible on all model cards
- Consistent experience - Same ranking logic across Recommended, Local, and Cloud tabs
Summary
These additions would make VoiceInk:
- More performant - 15x faster transcription option
- More accessible - Better support for Asian languages
- More user-friendly - Intelligent model recommendations
See PR #410 for the complete implementation.