edge-tts icon indicating copy to clipboard operation
edge-tts copied to clipboard

How do I remove blank lines from VTT subtitles?

Open zdoek001 opened this issue 1 year ago • 9 comments

WEBVTT

00:00:00.086 --> 00:00:00.961

xxxxx

00:00:01.166 --> 00:00:02.586

xxxxx

There's always a blank line between the timeline and the characters?

zdoek001 avatar Jul 03 '24 18:07 zdoek001

I don't understand? is it doing something different from https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#webvtt_files ?

rany2 avatar Jul 03 '24 18:07 rany2

I don't understand? is it doing something different from https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#webvtt_files ?

Strange, the first line in my VTT subtitles is always a blank line

zdoek001 avatar Jul 03 '24 18:07 zdoek001

WEBVTT

00:00:00.086 --> 00:00:00.961

xxxxx

00:00:01.166 --> 00:00:02.586

xxxxx

Normally it should be like this: WEBVTT

00:00:00.086 --> 00:00:00.961 xxxxx

00:00:01.166 --> 00:00:02.586 xxxxx

And mine is this: WEBVTT

00:00:00.086 --> 00:00:00.961

xxxxx

00:00:01.166 --> 00:00:02.586

xxxxx

zdoek001 avatar Jul 03 '24 18:07 zdoek001

I have an internal version of edge-tts which has many subtitle fixes (especially noticeable Chinese) and uses pysrt for subtitle generation so this issue should be fixed, but I never had this issue in the first place so :/

rany2 avatar Jul 03 '24 18:07 rany2

I have an internal version of edge-tts which has many subtitle fixes (especially noticeable Chinese) and uses pysrt for subtitle generation so this issue should be fixed, but I never had this issue in the first place so :/

New version coming soon? Expect to generate str directly

zdoek001 avatar Jul 03 '24 18:07 zdoek001

If you're keen you could test it out I pushed my wip branch, https://github.com/rany2/edge-tts/tree/wip-subtitles

rany2 avatar Jul 03 '24 18:07 rany2

It needs to be simplified a bit more before it's ready, right now it's more of a bodge and a concept. There are some issues so it's not ready to be in master yet because the TTS service would rewrite the input text and then return in word boundary.

For example, if you asked TTS to generate text for "1k.m." it will be rewritten internally by the service as "1 kilometer" and the mapping will fail; I've attempted to fix such issues but it's still a WIP.

rany2 avatar Jul 03 '24 18:07 rany2

Using newline="\n" in with open(...) as file: fixed the issue on my windows device. It seems to be a Linux/windows problem. https://stackoverflow.com/questions/9184107/how-can-i-force-pythons-file-write-to-use-the-same-newline-format-in-windows

line 31 in async subtitle example should be adjusted

GerFr avatar Jul 19 '24 23:07 GerFr

WEBVTT 00:00:00.086 --> 00:00:00.961 xxxxx 00:00:01.166 --> 00:00:02.586 xxxxx

Normally it should be like this: WEBVTT

00:00:00.086 --> 00:00:00.961 xxxxx

00:00:01.166 --> 00:00:02.586 xxxxx

And mine is this: WEBVTT

00:00:00.086 --> 00:00:00.961

xxxxx

00:00:01.166 --> 00:00:02.586

xxxxx

You can delete these lines after the subs have been written to the VTT file using the example streaming_with_subtitles.py by adding this code:

with open(WEBVTT_FILE, "w", encoding="utf-8") as file:
    file.write(submaker.generate_subs())

# Delete new lines in VTT file below cue
with open(WEBVTT_FILE, "r", encoding="utf-8") as file:
    lines = file.readlines()
with open(WEBVTT_FILE, "w", encoding="utf-8") as file:
    for line in lines:
        if "-->" in line:
            file.write(line.strip() + " ")
        else:
            file.write(line)

This allows to play the audio together with the VTT file in players such as mpv and MPC-HC, otherwise the subs will not be displayed as they are considered invalid due to an incorrect format.

Ideally, this should also be fixed in the CLI.

FaintWhisper avatar Aug 02 '24 04:08 FaintWhisper