VoiceInk icon indicating copy to clipboard operation
VoiceInk copied to clipboard

Feature Request: SenseVoice ASR Model and Improved Model Ranking

Open tmm22 opened this issue 1 month ago • 0 comments

Overview

This issue documents the benefits of adding the SenseVoice ASR model and an improved model ranking system to VoiceInk.

Related PR: #410


Why SenseVoice?

Performance Benefits

Metric SenseVoice Whisper Large v3
Speed 15x faster Baseline
Accuracy 96% 97%
Model Size 234 MB 1.5 GB
RAM Usage ~800 MB ~3 GB

SenseVoice delivers near-Whisper accuracy at a fraction of the computational cost, making it ideal for:

  • Users with limited RAM or older hardware
  • Scenarios requiring real-time transcription
  • Battery-conscious laptop users

Language Support

SenseVoice excels at Asian languages that Whisper sometimes struggles with:

  • Chinese (Mandarin) - Primary optimization target
  • Cantonese - Often underserved by other models
  • Japanese - Excellent kanji recognition
  • Korean - Strong hangul support
  • English - Competitive with Whisper

This makes VoiceInk more accessible to users in Asia-Pacific regions.


Why Improved Model Ranking?

The Problem

Currently, models are sorted in a fixed order that doesn't reflect their actual performance characteristics. Users must manually research which models offer the best balance of speed and accuracy.

The Solution

A dynamic ranking system using a geometric mean score:

score = sqrt(accuracy × speed) + bonus

This approach:

  1. Rewards balance - Models must excel at BOTH metrics to rank high
  2. Penalizes extremes - A model with 99% speed but 50% accuracy ranks lower than one with 80%/80%
  3. Applies bonus - High performers (accuracy >= 0.94 AND speed >= 0.75) get +0.1 boost

User Benefits

  • Faster model selection - Best options appear first
  • Informed decisions - Speed/accuracy ratings visible on all model cards
  • Consistent experience - Same ranking logic across Recommended, Local, and Cloud tabs

Summary

These additions would make VoiceInk:

  • More performant - 15x faster transcription option
  • More accessible - Better support for Asian languages
  • More user-friendly - Intelligent model recommendations

See PR #410 for the complete implementation.

tmm22 avatar Nov 28 '25 00:11 tmm22