Add Model harness with WER score and tests (STT)

Open Blaizzy opened this issue 9 months ago • 0 comments

Description

We need to implement a model harness for evaluating Speech-to-Text (STT) models that calculates Word Error Rate (WER) as the primary performance metric, along with comprehensive tests.

Requirements

Implement a model harness that can load and evaluate any STT model in our system
Calculate WER (Word Error Rate) as the primary metric
Support additional metrics where appropriate (CER, BLEU, etc.)
Provide test utilities to generate synthetic audio for testing edge cases
Create benchmark test suite with standard datasets (LibriSpeech, Common Voice, etc.)
Support different audio formats and sampling rates
Generate comprehensive reports with per-utterance and aggregate scores

Acceptance Criteria

[x] Model harness successfully loads and evaluates STT models
[x] WER calculation matches reference implementation (tested against known examples)
[x] Test suite covers at least 3 standard STT datasets
[x] Documentation includes examples and explanation of metrics
[x] CI integration ensures WER doesn't regress on benchmark datasets

May 09 '25 18:05 Blaizzy