WhisperLiveKit
WhisperLiveKit copied to clipboard
Can we split at last sentence
As whisper can only run 30s of audio. in whisper streaming whisper is run itteratively
Whisper is run every second (even if in the publication of whisper streaming they say 2s would be optimal). Words are commited if high confidence or two consecutive runs agree on them.
the same audio buffer is kept and rerun untill chunked. The chunking may work on tow different ways.
-
segment after some time (15s by default) the audio buffer is chunked by the second last word if it is commited or the last commited word.
-
sentence after some time the audio buffer is chunked by the second last sentente if all is commited. If after 30s no sentence is found the audio is anyway chunked.
If I understand it correctly whisper running takes the same time if 1 or 30 s. But the quality is usually better if you have a whole sentence. and if one could run start to end of the same sentence in one go.
From this I would change the sentence fragmentation so that: Adio buffer is chunked at the end of sentence <15s extept the last or maybe even at the last if the last sentence is complete. At 15s (or the threshold selected) the audio buffer is cut anyway even no sentence is found.