VoiceInk
VoiceInk copied to clipboard
feat: add SenseVoice ASR model and improved model ranking
Summary
This PR adds support for Alibaba's SenseVoice ASR model and implements an improved model ranking system.
New Features
SenseVoice ASR Model Integration
- SenseVoice - Alibaba's ultra-fast multilingual ASR model
- Supports Chinese, Cantonese, English, Japanese, Korean
- 15x faster than Whisper with 96% accuracy
- 234 MB model size using ONNX Runtime inference
- Uses LFR (Low Frame Rate) feature stacking for efficiency
Dynamic Model Ranking System
- Added
speedandaccuracyproperties to the TranscriptionModel protocol - Models are now sorted using a geometric mean score:
sqrt(accuracy × speed) - Models excelling at BOTH speed AND accuracy rank highest
- Bonus (+0.1) applied for models with high accuracy (>=0.94) AND high speed (>=0.75)
- Sorting applied across Recommended, Local, and Cloud tabs
Files Added
| File | Description |
|---|---|
SenseVoiceTranscriptionService.swift |
ONNX-based inference with LFR feature stacking |
SenseVoiceTokenizer.swift |
Token decoding for SenseVoice output format |
SenseVoiceModelCardView.swift |
UI card with speed/accuracy ratings |
WhisperState+SenseVoice.swift |
Model download, delete, and management |
FastConformerFeatureExtractor.swift |
Audio feature extraction for ONNX models |
Files Modified
TranscriptionModel.swift- Added SenseVoiceModel struct and protocol extensionsPredefinedModels.swift- Added SenseVoice model definitionModelManagementView.swift- Added ranking algorithm and SenseVoice actionsModelCardRowView.swift- Added SenseVoice card renderingWhisperState.swift- Added service and routingproject.pbxproj- Added onnxruntime-swift-package-manager dependency
Dependencies
This PR adds the onnxruntime-swift-package-manager package from Microsoft for ONNX Runtime inference.
Testing
- [x] Build succeeds
- [x] Model downloads successfully
- [x] Transcription works correctly
- [x] Model ranking sorts models as expected
Screenshots
The SenseVoice model appears in the Local models tab with speed/accuracy ratings displayed.
Summary by cubic
Adds SenseVoice multilingual ASR via ONNX Runtime and a new ranking that prioritizes models that are both fast and accurate. This brings much faster Asian-language transcription and better model recommendations across tabs.
-
New Features
- SenseVoice ASR integration (zh/yue/en/ja/ko), ~234 MB, up to 15x faster than Whisper.
- LFR feature stacking, custom tokenizer, and greedy decoding with a fast feature extractor.
- SenseVoice model card with download/delete/show-in-Finder and progress UI.
- Ranking updates: added speed/accuracy to TranscriptionModel; geometric-mean score with a small bonus; applied to Recommended, Local, and Cloud.
-
Bug Fixes
- Added SenseVoice routing in AudioFileTranscriptionService and Manager to prevent cloud fallback.
- More robust downloads and decoding: HTTP status validation and CTC collapse for cleaner text.
Written for commit 7cb1b5478408792a5ebe65afc61deb265fd0983e. Summary will update automatically on new commits.