trancribe long text without punctuation
it sometiomes transcirbe long video(about 1-2h) text without punctuation,is there a way to force it all the time with punctuation?
I'd like to get an improvement on this too, like a dream 🙂
See https://github.com/ggerganov/whisper.cpp/issues/393#issuecomment-1376164100 for some hints
I think the problem is related to this stuff: https://youtu.be/9T-1SnepFho. And I fear it's unavoidable.
I'd like to get an improvement on this too, like a dream 🙂
I usually just transcribe the chapters as long as they're around 30 mins each. If more than that I'll just split up audiobook to 20 or 30m segments then transcribe. This way say there is a "lose punctuation" phase which will continue till the end it only affects say a few minutes usually. When doing say 50 hour audiobooks the entire transcription is useless. I only use large model as medium as well as medium.en tend to not do pronunciation like large usually does. Another way is removing anything with 3 seconds or more of silence.
I have had success getting ggml-small.en-q5_1.bin to add punctuation to otherwise unpunctuated lectures by adding the simple prompt --prompt 'Always include punctuation.'