Whisper-WebUI Adjusting Sentence Length in SRT File Sync

Hello,

I am currently working on synchronizing lyrics in SRT files. However, I'm encountering an issue where the sentences are too long, and I would like to split them into individual lines for synchronization.

For example, the output from the web UI looks like this:

1
00:00:00,000 --> 00:00:18,559
The quiet night skies are so bright They shine softly through the window light

What I want is this:

1
00:00:00,000 --> 00:00:08,316
The quiet night skies are so bright

2
00:00:08,340 --> 00:00:18,559
They shine softly through the window light.

It seems that the AI recognizes "They" as the beginning of a new sentence because it's capitalized. Is there any option in the web UI settings to adjust the sentence length so that the lyrics are split appropriately across multiple lines?

Thank you for your help!

Oct 11 '24 14:10 iodides

Hi. As far as I know this is difficult to achieve as there's no such parameter in the whisper yet.

Related issues:

https://github.com/openai/whisper/discussions/223#discussioncomment-3790591

We can think of some pre-processing or post-processing to workaround this.

For pre-processing, you could use VAD with short Min Speech Duration (ms), Min Silence Duration (ms) and long Speech Pad (ms) to force a short segment by forcing padding between each segment when trasncribing.

Still, this can sometimes give a bad result, because VAD often doesn't catch the very short silences between speeches, even though with the short Min Silence Duration (ms).

Another way of post-processing would be to simply force the number of words for each line when writing subtitles, as https://github.com/openai/whisper/discussions/223#discussioncomment-7239823 said.

+) As far as I know large-v3 is better for accurate timestamps. But large-v3 is only good with clean audio, because if the audio is noisy, it often causes hallucinations.

Oct 12 '24 06:10 jhj0517

+) Adding max_line_width would be helpful for this.

https://github.com/openai/whisper/blob/173ff7dd1d9fb1c4fddea0d41d704cfefeb8908c/whisper/transcribe.py#L559

Nov 15 '24 12:11 jhj0517