whisper.cpp
whisper.cpp copied to clipboard
How to improve accuracy of word level transcriptions
trafficstars
I plan to use whisper.cpp to edit the audio (by selecting the transcribed text) like Descript does, but it's not accurate enough at the moment. For instance, whisper timestamped seems To provide the best (good enough) results.
There are Two considerations that are important:
- accuracy of start and end times
- inclusion of filler words (em, hmm, ooh.. )
Are there plans to improve on those regards? And can I contribute in some way even if I'm not familiar with cpp?
In my experience, WhisperX does a better job of correcting timestamps than whisper_timestamped does.