WhisperTimeSync icon indicating copy to clipboard operation
WhisperTimeSync copied to clipboard

word_timestamps parameter

Open lucydjo opened this issue 1 year ago • 4 comments

Hello,

Is it possible to generate and synchronize subtitles with Whisper's "word_timestamps" parameter?

Thank you!

lucydjo avatar Sep 06 '23 03:09 lucydjo

@lucydjo It's the main goal of WhisperTimeSync. Did I misunderstood something in your question?

EtienneAb3d avatar Sep 06 '23 07:09 EtienneAb3d

@lucydjo Ok, I just understood. Some modifications has to be done on the data pre-processing.

EtienneAb3d avatar Sep 06 '23 07:09 EtienneAb3d

I have use "java -Xmx2G -jar WhisperTimeSync/distrib/WhisperTimeSync.jar before_correct.srt original_text.txt fr" and it's working great ! But I have a problem with the output file.

See sample of my data : https://gist.github.com/lucydjo/9ffea6ac4b60cd5a9c7b5fec7cb5126a

As you can see, there are empty lines, formatting problems... Do you know what I'm doing wrong? Thank you very much!

lucydjo avatar Sep 06 '23 19:09 lucydjo

@lucydjo First of all, I do not understand why you do not get the exact full original text in the output (especially the beginning, lacking in the SRT). I suppose you truncated the result. You get a blank on timestamp 5 because WhisperTimeSync considers "l'Ozone" as only one word, while your timestamp is cutting it in 2 parts. It thus matches "l'Ozone" with "Ozone", leaving the "l'" part empty.

EtienneAb3d avatar Sep 07 '23 07:09 EtienneAb3d