WhisperTimeSync
WhisperTimeSync copied to clipboard
word_timestamps parameter
Hello,
Is it possible to generate and synchronize subtitles with Whisper's "word_timestamps" parameter?
Thank you!
@lucydjo It's the main goal of WhisperTimeSync. Did I misunderstood something in your question?
@lucydjo Ok, I just understood. Some modifications has to be done on the data pre-processing.
I have use "java -Xmx2G -jar WhisperTimeSync/distrib/WhisperTimeSync.jar before_correct.srt original_text.txt fr" and it's working great ! But I have a problem with the output file.
See sample of my data : https://gist.github.com/lucydjo/9ffea6ac4b60cd5a9c7b5fec7cb5126a
As you can see, there are empty lines, formatting problems... Do you know what I'm doing wrong? Thank you very much!
@lucydjo First of all, I do not understand why you do not get the exact full original text in the output (especially the beginning, lacking in the SRT). I suppose you truncated the result. You get a blank on timestamp 5 because WhisperTimeSync considers "l'Ozone" as only one word, while your timestamp is cutting it in 2 parts. It thus matches "l'Ozone" with "Ozone", leaving the "l'" part empty.