chat-downloader
chat-downloader copied to clipboard
[FEATURE] Subtitle Format Support?
Is there anyway you could make it so I can export the file into an subtitle format like srt or ass? I don't just want to backup the chat I want to watch the video with chat in vlc.
Thank you.
This sounds like an interesting idea. The thing is, you could make it using the python module. I spent 5 minutes implementing a basic version, which could look something like this:
from chat_downloader import ChatDownloader
def seconds_to_time(seconds):
int_seconds = int(seconds)
h, remainder = divmod(abs(int_seconds), 3600)
m, s = divmod(remainder, 60)
decimal = str(round(float(seconds) - int_seconds, 3))[2:]
return f"{'-' if seconds < 0 else ''}{h:02}:{m:02}:{s:02},{decimal}"
url = 'https://www.youtube.com/watch?v=VlWb1RONsIw'
chat = ChatDownloader().get_chat(url, start_time=0) # create a generator
max_duration = 5
counter = 1
last_time = 0
for message in chat:
current_time = min(message['time_in_seconds'], last_time + max_duration)
# Output to console
print(counter)
print(f'{seconds_to_time(last_time)} --> {seconds_to_time(current_time)}')
print(f"{message['author']['name']}: {message['message']}")
print()
last_time = message['time_in_seconds']
counter += 1
which outputs:
1
00:00:00,0 --> 00:00:05,0
Adam Hložek: so hikaru isnt playing?
2
00:00:10,242 --> 00:00:12,375
Swiss Reyes: future world champion the best from the WEST and SOlid in the SOuth
3
00:00:12,375 --> 00:00:17,375
Joakim Raatikainen: It's chess I guess???
4
00:00:32,123 --> 00:00:32,532
singhalarjun19: waiting for chessbrah to say ... is this theory ? :face_with_tongue:
5
00:00:32,532 --> 00:00:37,532
Adam Hložek: IS HIKARU PLAYING, YES OR NO?
6
00:01:09,42 --> 00:01:14,42
Donald Metzger: Not this tournament, although Hikaru qualifies by points. He is almost certain to have enough points for the final tournament already
7
00:01:14,936 --> 00:01:19,936
Mark Shark: Hikaru is streaming the tourney
8
00:01:31,302 --> 00:01:33,825
Adam Hložek: rip
9
00:01:33,825 --> 00:01:38,825
Rick Jena: lets go hottub Carlsen show em the power of hottub
10
00:02:07,565 --> 00:02:12,565
Marcin Beski: I wish a lot of luck to Jan Krzysztof Duda
11
00:02:45,184 --> 00:02:50,184
Alekhine Battery: Would you like to have David Howell, Tania Sachdev or Simon Williams review your chess games live? Leave a comment in this community post! - https://bit.ly/2WuKsvh
12
00:02:51,756 --> 00:02:52,111
Alekhine Battery: Would you like to have David Howell, Tania Sachdev or Simon Williams review your chess games live? Leave a comment in this community post! - https://bit.ly/2WuKsvh
13
00:02:52,111 --> 00:02:57,111
Geshvad Nasiri: so will win again
14
00:03:09,174 --> 00:03:14,174
David Trottier: Are we going to see infamous aimchess ads throughout this tour?
15
00:03:21,281 --> 00:03:24,921
Rick Jena: I would rather choose lithium battery
16
00:03:24,921 --> 00:03:29,921
Karan Anvekar: Hi
17
00:03:37,584 --> 00:03:42,584
Alekhine 2255: guys will there be dubov in next tournament
This is just a very basic example, so I'm sure you could improve it. For example, I added a "maximum duration" (a message will not stay on the screen for more than 5 seconds), but you could improve it so that messages do not disappear too quickly.
That being said, if there is enough demand, or someone wants to improve upon this basic version, I'd be happy to add it to the software. For example, adding a --output chat.srt
command.
Thanks for taking the time to respond. Unfortunately I do not know how to code but yes I would love to see srt or ass support if possible. It would be cool if the subtitle was formatted in danmaku/nico nico douga style too so multiple comments can be on the screen at the same time. Something like this https://github.com/m13253/danmaku2ass
I'm a japanese student and getting the chat into a subtitle format would help me and I'm sure people that want to archive vtubers and other japanese youtubers would be interested in this as well. Thanks again.
@Sephy1 Here's a script to download to ASS (supports scrolling comments and multiple comments onscreen at the same time) or SRT (does not support scrolling or multiple comments onscreen):
import argparse
import os
from chat_downloader import ChatDownloader
from typing import List
class ChatMessage:
TimestampSeconds: float
Author: str
MessageText: str
def __init__(self, timestamp_seconds: float, author:str, message_text: str) -> None:
self.TimestampSeconds = timestamp_seconds
self.Author = author
self.MessageText = message_text
class SrtLine:
Index: int
StartTimeSeconds: float
EndTimeSeconds: float
Author: str
MessageText: str
def __init__(self, index: int, start_time_seconds: float, end_time_seconds: float, author: str, message_text: str) -> None:
self.Index = index
self.StartTimeSeconds = start_time_seconds
self.EndTimeSeconds = end_time_seconds
self.Author = author
self.MessageText = message_text
def __seconds_to_timestamp(self, seconds: float):
int_seconds = int(seconds)
h, remainder = divmod(abs(int_seconds), 3600)
m, s = divmod(remainder, 60)
milliseconds = round(1000 * (float(seconds) - int_seconds))
return f"{'-' if seconds < 0 else ''}{h:02}:{m:02}:{s:02},{milliseconds:03}"
def to_string(self) -> str:
return f'{self.Index}\n{self.__seconds_to_timestamp(self.StartTimeSeconds)} --> {self.__seconds_to_timestamp(self.EndTimeSeconds)}\n<font color="#00FF00">{self.Author}</font>: {self.MessageText}\n\n'
assHeader = """[Script Info]
ScriptType: v4.00+
Collisions: Normal
PlayResX: 640
PlayResY: 480
Timer: 100.0000
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Myriad Web Pro Condensed,26,&H00ffffff,&H0000ffff,&H0025253a,&H96000000,0,0,0,0,100,100,0,0.00,1,2,1,2,15,15,20,1
[Events]
Format: Layer, Start, End, Style, Actor, MarginL, MarginR, MarginV, Effect, Text
"""
class AssLine:
StartTimeSeconds: float
EndTimeSeconds: float
Author: str
MessageText: str
def __init__(self, start_time_seconds: float, end_time_seconds: float, author: str, message_text: str) -> None:
self.StartTimeSeconds = start_time_seconds
self.EndTimeSeconds = end_time_seconds
self.Author = author
self.MessageText = message_text
def __seconds_to_timestamp(self, seconds: float):
int_seconds = int(seconds)
h, remainder = divmod(abs(int_seconds), 3600)
m, s = divmod(remainder, 60)
hundredths = round(100 * (float(seconds) - int_seconds))
return f"{'-' if seconds < 0 else ''}{h:01}:{m:02}:{s:02}.{hundredths:02}"
def to_string(self) -> str:
fadeMilliseconds = round(1000 * (self.EndTimeSeconds - self.StartTimeSeconds) / 20)
return f'Dialogue: 0,{self.__seconds_to_timestamp(self.StartTimeSeconds)},{self.__seconds_to_timestamp(self.EndTimeSeconds)},,,0000,0000,0000,,{{\\move(320,480,320,360)}}{{\\fad({fadeMilliseconds},{fadeMilliseconds})}}{{\\1c&H00FF00&}}{self.Author}: {{\\1c&HFFFFFF&}}{self.MessageText}\n'
def even_spaced_timestamp_filter(chat_messages: List[ChatMessage], smoothing_interval_seconds: float = 5):
"""Smooths out chat message timestamps within regularly-spaced intervals, so that timestamps are more evenly-spaced. This helps readability when bursts of several messages occur at nearly the same time."""
if len(chat_messages) == 0:
return
if smoothing_interval_seconds <= 0:
raise ValueError(f'smoothingIntervalSeconds must be positive, but was {smoothing_interval_seconds}')
minIndex = 0
maxIndex = -1
minTimestamp = 0
maxTimestamp = smoothing_interval_seconds
lastTimestamp = chat_messages[-1].TimestampSeconds
while minTimestamp < lastTimestamp:
while maxIndex + 1 < len(chat_messages) and chat_messages[maxIndex + 1].TimestampSeconds < maxTimestamp:
maxIndex += 1
commentsInInterval = maxIndex - minIndex + 1
if commentsInInterval > 0:
for i in range(0, commentsInInterval):
chat_messages[minIndex + i].TimestampSeconds = minTimestamp + (2 * i + 1) * smoothing_interval_seconds / (2 * commentsInInterval)
minIndex = maxIndex + 1
minTimestamp += smoothing_interval_seconds
maxTimestamp += smoothing_interval_seconds
def parse_chat_messages(chats) -> List[ChatMessage]:
chatMessages: List[ChatMessage] = []
for chat in chats:
messageText: str = chat['message']
# Replace shorthand emotes, like :partying_face:, with UTF, like 🥳.
emotes = chat.get('emotes')
if emotes:
for emote in emotes:
utfId = emote['id']
shortcuts = emote['shortcuts']
# "Custom emojis" use sprite images, not UTF characters, and SRT cannot display images, so ignore these.
isNotCustomEmoji = not emote['is_custom_emoji']
if utfId and shortcuts and isNotCustomEmoji:
for shortcut in shortcuts:
messageText = messageText.replace(shortcut, utfId)
chatMessages.append(ChatMessage(
timestamp_seconds=chat['time_in_seconds'],
author=chat['author']['name'],
message_text=messageText))
return chatMessages
def parse_srt_lines(chat_messages: List[ChatMessage], max_seconds_onscreen: float = 5) -> List[SrtLine]:
if max_seconds_onscreen <= 0:
raise ValueError(f'max_seconds_onscreen must be positive, but was {max_seconds_onscreen}')
srtLines: List[SrtLine] = []
for index, chatMessage in enumerate(chat_messages):
nextTimestampSeconds = chat_messages[index + 1].TimestampSeconds if index + 1 < len(chat_messages) else float("inf")
srtLines.append(SrtLine(
index=index,
start_time_seconds=chatMessage.TimestampSeconds,
end_time_seconds=min(nextTimestampSeconds, chatMessage.TimestampSeconds + max_seconds_onscreen),
author=chatMessage.Author,
message_text=chatMessage.MessageText))
return srtLines
def parse_ass_lines(chat_messages: List[ChatMessage], max_seconds_onscreen: float = 5, grouping_interval_seconds: float = 5, max_subtitles_onscreen: int = 5) -> List[AssLine]:
if max_seconds_onscreen <= 0:
raise ValueError(f'max_seconds_onscreen must be positive, but was {max_seconds_onscreen}')
if grouping_interval_seconds <= 0:
raise ValueError(f'grouping_interval_seconds must be positive, but was {grouping_interval_seconds}')
if max_subtitles_onscreen <= 0:
raise ValueError(f'max_subtitles_onscreen must be positive, but was {max_seconds_onscreen}')
assLines: List[AssLine] = []
if len(chat_messages) == 0:
return assLines
minTimestamp = 0
maxTimestamp = grouping_interval_seconds
lastTimestamp = chat_messages[-1].TimestampSeconds
minIndex = 0
maxIndex = -1
while minTimestamp < lastTimestamp:
while maxIndex + 1 < len(chat_messages) and chat_messages[maxIndex + 1].TimestampSeconds < maxTimestamp:
maxIndex += 1
commentsInInterval = maxIndex - minIndex + 1
if commentsInInterval > 0:
subtitlesPerSecond = commentsInInterval / grouping_interval_seconds
for i in range(0, commentsInInterval):
chatMessage = chat_messages[minIndex + i]
timeOnscreen = min(max_subtitles_onscreen / subtitlesPerSecond, max_seconds_onscreen)
assLines.append(AssLine(
start_time_seconds=chatMessage.TimestampSeconds,
end_time_seconds=chatMessage.TimestampSeconds + timeOnscreen,
author=chatMessage.Author,
message_text=chatMessage.MessageText))
minIndex = maxIndex + 1
minTimestamp += grouping_interval_seconds
maxTimestamp += grouping_interval_seconds
return assLines
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--url', required=True)
parser.add_argument('--max_seconds_onscreen', required=False, default=5)
parser.add_argument('--smoothing_interval_seconds', required=False, default=5)
parser.add_argument('--title', required=False, default='subtitles')
subparsers = parser.add_subparsers(dest='command', required=True)
parser_srt = subparsers.add_parser('srt')
parser_ass = subparsers.add_parser('ass')
parser_ass.add_argument('--max_subtitles_onscreen', required=False, default = 5)
args = parser.parse_args()
chatMessages = parse_chat_messages(ChatDownloader().get_chat(args.url))
even_spaced_timestamp_filter(chatMessages, args.smoothing_interval_seconds)
if args.command == 'srt':
lines = parse_srt_lines(chatMessages, args.max_seconds_onscreen)
elif args.command == 'ass':
lines = parse_ass_lines(chatMessages, args.max_seconds_onscreen, args.smoothing_interval_seconds, args.max_subtitles_onscreen)
filePath = os.path.join(os.getcwd(), f'{args.title}.{args.command}')
with open(filePath, 'w', encoding='utf-8') as file:
if args.command == 'ass':
file.write(assHeader)
for line in lines:
file.write(line.to_string())
print(f'Wrote subtitles to {filePath}')
It's similar to xenova's example above, with a few changes:
- Fixes a bug where emojis are left in their raw input form, like
:partying_face:
, instead of 🥳. - Fixes a bug where messages whose milliseconds timestamp ended with 1 or 2 zeros (such as
32.500
seconds) were output with trailing zeros removed, which causes them to appear onscreen longer than they should. - Adds an evenly-spaced timestamp filter, so that when several people post comments at the same time, the spike of comments gets smoothed out over time, which makes comments easier to read.
To use it, run pip install chat-downloader
if you haven't already, save the script to a file like srt_subtitle_downloader.py
, and then run it:
python srt_subtitle_downloader.py --url https://www.youtube.com/watch?v=k-S4ZRlMf6Q ass
If you run python srt_subtitle_downloader.py --help
, it'll print out a help menu. The ASS scrolling is vertical (messages start at bottom of screen, and move upwards), whereas Nico-Nico is horizontal (messages start at right of screen and move left). You can tweak the code to make them scroll horizontally, though.
@xenova It might be cool to add a --output chat.srt
option to chat-downloader
itself as you suggested, but the even_spaced_timestamp_filter
requires having all comments upfront, which doesn't work with the continuous_write streaming abstraction, which assumes output can be done comment-by-comment. Any thoughts? Possible solutions:
- Add an special case if
--output
ends with.srt
that downloads all the chat, and then writes them to SRT in one go. This might be a little ugly, but would sidestep the problem. - Change
continuous_write.py
to accept the full list of chat comments. This would be a breaking change in behavior, since the point ofcontinuous_write.py
was to support streaming comments as they are downloaded. - Remove the
even_spaced_timestamp_filter
. This would allow using the existing output streaming interface, but it makes subtitles harder to read.
Oh hey, I made this! Check out https://github.com/9001/softchat
EDIT: The current output modes, -m1
and -m2
, are designed for high-speed chats (10+ messages per second) -- I'll add a more conventional subtitle view, -m3
, for slower chats :>