WhisperKit icon indicating copy to clipboard operation
WhisperKit copied to clipboard

Decreasing Speed and Delayed Confirmation in Stream Transcription Over Time

Open gavin1818 opened this issue 5 months ago • 7 comments

I’ve been using WhisperKit for real-time stream transcription in a project, and I’ve noticed that as time progresses, particularly after 20-30 minutes of continuous use, the transcription speed begins to decrease noticeably. Additionally, the transcript seems to remain unconfirmed for an extended period. During this time, the same text is repeated for a long duration within the unconfirmed segment, which results in the latest transcript not being transferred in a timely manner. This causes a significant gap between the audio and the corresponding transcription.

I’m aware that this issue might be challenging to resolve quickly, but I’m curious about the potential causes. Could this be related to Model-Level Issues, Decoder-Level Issues or others?

I would appreciate any insights into which areas might be the most likely cause of the issue. If there are specific parts of the code or certain tools I should use to investigate these potential causes further, I’d be grateful for the guidance.

Thanks

gavin1818 avatar Aug 27 '24 23:08 gavin1818