whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Can the accuracy of the timestamp be improved?

Open czkoko opened this issue 1 year ago • 3 comments

The timestamp of whisper is not very accurate. The following is the comparison between Microsoft Cognitive Services Speech and whisper.

1                                    
00:00:00,120 --> 00:00:01,379 (Microsoft)    
[00:00:00.000 --> 00:00:02.000] (whisper)
2
00:00:02,120 --> 00:00:06,320 (Microsoft)  
[00:00:02.000 --> 00:00:07.500] (whisper)

czkoko avatar Dec 10 '22 17:12 czkoko

Yes, this would be much appreciated, I'm not sure how much can be done without retraining the model(s) though. I suppose you are using the large model? I've found the smaller models to be less accurate.

Btw for the original whisper there's the stable-ts fork, maybe that can provide some inspiration. See here: https://github.com/openai/whisper/discussions/435

misutoneko avatar Dec 11 '22 00:12 misutoneko

The timestamp precision is a limitation of the model. You would need some sort of pre/post-processing to improve the timestamps. But at the moment it is not clear what is the best approach.

ggerganov avatar Dec 11 '22 18:12 ggerganov

Apparently, this work has been done to improve time stamps. https://github.com/jianfch/stable-ts

pneyrinck avatar Dec 12 '22 17:12 pneyrinck