ytdlnis icon indicating copy to clipboard operation
ytdlnis copied to clipboard

[FEATURE REQUEST] Fix YouTube's autogenerated subtitles doubling

Open ershovev opened this issue 11 months ago • 2 comments

Is your feature request available in yt-dlp? Please describe. Not available.

When you download automatic subtitles from YouTube, the resulting subtitle is a rolling subtitle - every time a new line is added, the previous one is moved up a line - if there's more than two lines, the first one disappears. Think Star Wars intro, but with only two lines:

A subtitle converted from VTT to SRT by yt-dlp would look something like this

00:00 --> 00:03 
This is the first line

00:03 --> 00:10 
This is the first line
This is what happens when another line is added

00:10
This is what happens when another line is added
If a third one is added, the first one disappears and the second one shoots up

The problem with this is that it's really hard to read, since you expect both lines to change, and it becomes really distracting.

Describe the solution you'd like Maybe some flag "Fix YouTube autogenerated subtitles doubling" in settings?

Users at github and superuser.com suggests some fixes for ytdl: 1)


def fix_youtube_vtt(vtt_file_path) -> str:
    """Fixes Youtube's autogenerated VTT subtitles and returns a srt-formatted string"""

    import webvtt

    pretty_subtitle = ''  
    previous_caption_text = ''
    i = 1
    for caption in webvtt.read(vtt_file_path):

        if previous_caption_text == caption.text.strip():
            # if previous and current lines are `identical`, print the start time from the previous
            # and the end time from the current.
            pretty_subtitle += f"{i}\n{previous_caption_start} --> {caption.end}\n{previous_caption_text}\n\n"
            i += 1

        elif previous_caption_text == caption.text.strip().split("\n")[0]: 
            # if the current caption is multiline, and the previous caption is equal to 
            # the current's first line, just ignore the first line and move on with the second.
            previous_caption_text = caption.text.strip().split("\n")[1]
            previous_caption_start = caption.start
            last_caption_end = caption.end

        else:	    
            previous_caption_text = caption.text.strip()
            previous_caption_start = caption.start.strip()

    return pretty_subtitle

yt-dlp --embed-subs --merge-output-format mkv -f 'bv+ba' --write-auto-subs --sub-langs 'en' 'https://youtu.be/3_HG33-IYaY' --sub-format ttml --convert-subs srt --exec 'before_dl:fn=$(echo %(_filename)s| sed "s/%(ext)s/en.srt/g") && ffmpeg -fix_sub_duration -i "$fn" -c:s text "$fn".tmp.srt && mv "$fn".tmp.srt "$fn"'

function cleanVttFile($fileName, $outputName) {

    $lines = file($fileName);
    $headers = ['WEBVTT', 'Kind: captions', 'Language: en'];
    $modified_lines = [];
    $prev_line = "";

    foreach ($lines as $line) {
        // Skip headers
        if (in_array(trim($line), $headers)) {
            $modified_lines[] = $line;
            continue;
        }

        // Skip timestamp lines and blank lines
        if (preg_match('/\d{2}:\d{2}:\d{2}\.\d{3} --> \d{2}:\d{2}:\d{2}\.\d{3}.*/', $line) || trim($line) == "") {
            $modified_lines[] = $line;
            continue;
        }

        // Remove time tags
        $stripped_line = preg_replace('/<[^>]*>/', '', $line);

        // Compare with previous line
        if ($stripped_line != $prev_line || $prev_line == "") {
            $modified_lines[] = $line;
        }

        // Update previous line
        $prev_line = $stripped_line;
    }

    file_put_contents($outputName, $modified_lines);
}

ershovev avatar Mar 21 '24 17:03 ershovev

@ershovev you need to make this issue to the yt-dlp repository, not here. They will be able to handle this. I dont code the core ytdlp itself, just the android app interface of it.

zaednasr avatar Mar 21 '24 18:03 zaednasr

@ershovev you need to make this issue to the yt-dlp repository, not here. They will be able to handle this. I dont code the core ytdlp itself, just the android app interface of it.

Got it, sorry

According to these topics, it seems that they are not planning to fix it

https://github.com/yt-dlp/yt-dlp/issues/6274 https://github.com/yt-dlp/yt-dlp/issues/1734

ershovev avatar Mar 21 '24 18:03 ershovev