whisper.cpp
whisper.cpp copied to clipboard
Can the accuracy of the timestamp be improved?
The timestamp of whisper is not very accurate. The following is the comparison between Microsoft Cognitive Services Speech and whisper.
1
00:00:00,120 --> 00:00:01,379 (Microsoft)
[00:00:00.000 --> 00:00:02.000] (whisper)
2
00:00:02,120 --> 00:00:06,320 (Microsoft)
[00:00:02.000 --> 00:00:07.500] (whisper)
Yes, this would be much appreciated, I'm not sure how much can be done without retraining the model(s) though. I suppose you are using the large model? I've found the smaller models to be less accurate.
Btw for the original whisper there's the stable-ts fork, maybe that can provide some inspiration. See here: https://github.com/openai/whisper/discussions/435
The timestamp precision is a limitation of the model. You would need some sort of pre/post-processing to improve the timestamps. But at the moment it is not clear what is the best approach.
Apparently, this work has been done to improve time stamps. https://github.com/jianfch/stable-ts