f1-dash icon indicating copy to clipboard operation
f1-dash copied to clipboard

feat: implement automatic driver radio transcription

Open kyujin-cho opened this issue 7 months ago • 5 comments

Abstract

This patch adds new feature which displays transcript of every driver radio. Screenshot 2024-06-30 at 5 46 24 PM

What's changed

live-backend

  • New /api/audio API added As F1TV's live timing CDN (https://livetiming.formula1.com/static) does not permit cross-origin requests, every calls to obtain the speech file should be proxied through the backend. Since routing every request to the file can marginally increase traffic burden of the live-backend (and also potential IP ban from F1TV CDN), I have decided to make this API only as an optional feature, which can be opted in by defining ENABLE_AUDIO_FETCH environment variable when loading the server process.

dash

  • Automatic Speech Recognition pipeline This pipeline accepts a sampled audio data and then inferences the transcription data with help of Transformers.js and OpenAI's Whisper Model. There are loads of whisper-based models, but based on my experiences, I have made three models as available option in this project (check dash/src/app/(nav)/settings/page.tsx). Those options will be labeled as More Quality, Balanced and Low Latency as it stands.
    That says, only the computational resource of the client browser will be affected when executing the pipeline; API backend will not take part of the process.

kyujin-cho avatar Jun 30 '24 09:06 kyujin-cho