Feature Request: SenseVoice ASR Model and Improved Model Ranking

Open tmm22 opened this issue 1 month ago • 0 comments

Overview

This issue documents the benefits of adding the SenseVoice ASR model and an improved model ranking system to VoiceInk.

Related PR: #410

Why SenseVoice?

Performance Benefits

Metric	SenseVoice	Whisper Large v3
Speed	15x faster	Baseline
Accuracy	96%	97%
Model Size	234 MB	1.5 GB
RAM Usage	~800 MB	~3 GB

SenseVoice delivers near-Whisper accuracy at a fraction of the computational cost, making it ideal for:

Users with limited RAM or older hardware
Scenarios requiring real-time transcription
Battery-conscious laptop users

Language Support

SenseVoice excels at Asian languages that Whisper sometimes struggles with:

Chinese (Mandarin) - Primary optimization target
Cantonese - Often underserved by other models
Japanese - Excellent kanji recognition
Korean - Strong hangul support
English - Competitive with Whisper

This makes VoiceInk more accessible to users in Asia-Pacific regions.

Why Improved Model Ranking?

The Problem

Currently, models are sorted in a fixed order that doesn't reflect their actual performance characteristics. Users must manually research which models offer the best balance of speed and accuracy.

The Solution

A dynamic ranking system using a geometric mean score:

score = sqrt(accuracy × speed) + bonus

This approach:

Rewards balance - Models must excel at BOTH metrics to rank high
Penalizes extremes - A model with 99% speed but 50% accuracy ranks lower than one with 80%/80%
Applies bonus - High performers (accuracy >= 0.94 AND speed >= 0.75) get +0.1 boost

User Benefits

Faster model selection - Best options appear first
Informed decisions - Speed/accuracy ratings visible on all model cards
Consistent experience - Same ranking logic across Recommended, Local, and Cloud tabs

Summary

These additions would make VoiceInk:

More performant - 15x faster transcription option
More accessible - Better support for Asian languages
More user-friendly - Intelligent model recommendations

See PR #410 for the complete implementation.

Nov 28 '25 00:11 tmm22