whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

slow or empty output on windows with stream example

Open drmbt opened this issue 2 years ago • 1 comments

I've got everything build, SDL2 in-line, and its anecdotally working

stream -c 0 -t 8 --length 500 -vth 0.1

is the only setting I've been able to consistently get some sort of output from, but its slow, and super intermittent... I might get two or three words from a 30 second blast, and it takes sometimes minutes to show up.

this is on a modern i9 hardware

this is a minute of results from counting to 60:

image

Does anyone have anecdotal evidence of real-time-ish transcription working with stream on windows?

drmbt avatar Mar 16 '23 19:03 drmbt

I am currently working on making whisper.cpp consume streams, since I didn't find stream example given to be sufficient for my use cases (transcript of continuous speech, not command-based interface). I am going to process data in x seconds chunks, but preserve model context in between, unlike stream example which instantiates new model each time it has data to process. Will send a PR when/if I finish my streaming approach. So far going well with chunks (thus latency) of 10 seconds.

My approach is to keep inference with the same model to preserve long speech context. Possibly, will also be able to detect language change during that

dsseng avatar Mar 18 '23 07:03 dsseng