ytdlnis
ytdlnis copied to clipboard
[FEATURE REQUEST] Fix YouTube's autogenerated subtitles doubling
Is your feature request available in yt-dlp? Please describe. Not available.
When you download automatic subtitles from YouTube, the resulting subtitle is a rolling subtitle - every time a new line is added, the previous one is moved up a line - if there's more than two lines, the first one disappears. Think Star Wars intro, but with only two lines:
A subtitle converted from VTT to SRT by yt-dlp would look something like this
00:00 --> 00:03
This is the first line
00:03 --> 00:10
This is the first line
This is what happens when another line is added
00:10
This is what happens when another line is added
If a third one is added, the first one disappears and the second one shoots up
The problem with this is that it's really hard to read, since you expect both lines to change, and it becomes really distracting.
Describe the solution you'd like Maybe some flag "Fix YouTube autogenerated subtitles doubling" in settings?
Users at github and superuser.com suggests some fixes for ytdl: 1)
def fix_youtube_vtt(vtt_file_path) -> str:
"""Fixes Youtube's autogenerated VTT subtitles and returns a srt-formatted string"""
import webvtt
pretty_subtitle = ''
previous_caption_text = ''
i = 1
for caption in webvtt.read(vtt_file_path):
if previous_caption_text == caption.text.strip():
# if previous and current lines are `identical`, print the start time from the previous
# and the end time from the current.
pretty_subtitle += f"{i}\n{previous_caption_start} --> {caption.end}\n{previous_caption_text}\n\n"
i += 1
elif previous_caption_text == caption.text.strip().split("\n")[0]:
# if the current caption is multiline, and the previous caption is equal to
# the current's first line, just ignore the first line and move on with the second.
previous_caption_text = caption.text.strip().split("\n")[1]
previous_caption_start = caption.start
last_caption_end = caption.end
else:
previous_caption_text = caption.text.strip()
previous_caption_start = caption.start.strip()
return pretty_subtitle
yt-dlp --embed-subs --merge-output-format mkv -f 'bv+ba' --write-auto-subs --sub-langs 'en' 'https://youtu.be/3_HG33-IYaY' --sub-format ttml --convert-subs srt --exec 'before_dl:fn=$(echo %(_filename)s| sed "s/%(ext)s/en.srt/g") && ffmpeg -fix_sub_duration -i "$fn" -c:s text "$fn".tmp.srt && mv "$fn".tmp.srt "$fn"'
function cleanVttFile($fileName, $outputName) {
$lines = file($fileName);
$headers = ['WEBVTT', 'Kind: captions', 'Language: en'];
$modified_lines = [];
$prev_line = "";
foreach ($lines as $line) {
// Skip headers
if (in_array(trim($line), $headers)) {
$modified_lines[] = $line;
continue;
}
// Skip timestamp lines and blank lines
if (preg_match('/\d{2}:\d{2}:\d{2}\.\d{3} --> \d{2}:\d{2}:\d{2}\.\d{3}.*/', $line) || trim($line) == "") {
$modified_lines[] = $line;
continue;
}
// Remove time tags
$stripped_line = preg_replace('/<[^>]*>/', '', $line);
// Compare with previous line
if ($stripped_line != $prev_line || $prev_line == "") {
$modified_lines[] = $line;
}
// Update previous line
$prev_line = $stripped_line;
}
file_put_contents($outputName, $modified_lines);
}
@ershovev you need to make this issue to the yt-dlp repository, not here. They will be able to handle this. I dont code the core ytdlp itself, just the android app interface of it.
@ershovev you need to make this issue to the yt-dlp repository, not here. They will be able to handle this. I dont code the core ytdlp itself, just the android app interface of it.
Got it, sorry
According to these topics, it seems that they are not planning to fix it
https://github.com/yt-dlp/yt-dlp/issues/6274 https://github.com/yt-dlp/yt-dlp/issues/1734