chat-downloader
                                
                                 chat-downloader copied to clipboard
                                
                                    chat-downloader copied to clipboard
                            
                            
                            
                        [FEATURE] Subtitle Format Support?
Is there anyway you could make it so I can export the file into an subtitle format like srt or ass? I don't just want to backup the chat I want to watch the video with chat in vlc.
Thank you.
This sounds like an interesting idea. The thing is, you could make it using the python module. I spent 5 minutes implementing a basic version, which could look something like this:
from chat_downloader import ChatDownloader
def seconds_to_time(seconds):
    int_seconds = int(seconds)
    h, remainder = divmod(abs(int_seconds), 3600)
    m, s = divmod(remainder, 60)
    decimal = str(round(float(seconds) - int_seconds, 3))[2:]
    return f"{'-' if seconds < 0 else ''}{h:02}:{m:02}:{s:02},{decimal}"
url = 'https://www.youtube.com/watch?v=VlWb1RONsIw'
chat = ChatDownloader().get_chat(url, start_time=0)       # create a generator
max_duration = 5
counter = 1
last_time = 0
for message in chat:
    current_time = min(message['time_in_seconds'], last_time + max_duration)
    # Output to console
    print(counter)
    print(f'{seconds_to_time(last_time)} --> {seconds_to_time(current_time)}')
    print(f"{message['author']['name']}: {message['message']}")
    print()
    last_time = message['time_in_seconds']
    counter += 1
which outputs:
1
00:00:00,0 --> 00:00:05,0
Adam Hložek: so hikaru isnt playing?
2
00:00:10,242 --> 00:00:12,375
Swiss Reyes: future world champion the best from the WEST and SOlid in the SOuth
3
00:00:12,375 --> 00:00:17,375
Joakim Raatikainen: It's chess I guess???
4
00:00:32,123 --> 00:00:32,532
singhalarjun19: waiting for chessbrah to say ... is this theory ? :face_with_tongue:
5
00:00:32,532 --> 00:00:37,532
Adam Hložek: IS HIKARU PLAYING, YES OR NO?
6
00:01:09,42 --> 00:01:14,42
Donald Metzger: Not this tournament, although Hikaru qualifies by points. He is almost certain to have enough points for the final tournament already
7
00:01:14,936 --> 00:01:19,936
Mark Shark: Hikaru is streaming the tourney
8
00:01:31,302 --> 00:01:33,825
Adam Hložek: rip
9
00:01:33,825 --> 00:01:38,825
Rick Jena: lets go hottub Carlsen show em the power of hottub
10
00:02:07,565 --> 00:02:12,565
Marcin Beski: I wish a lot of luck to Jan Krzysztof Duda
11
00:02:45,184 --> 00:02:50,184
Alekhine Battery: Would you like to have David Howell, Tania Sachdev or Simon Williams review your chess games live? Leave a comment in this community post! - https://bit.ly/2WuKsvh
12
00:02:51,756 --> 00:02:52,111
Alekhine Battery: Would you like to have David Howell, Tania Sachdev or Simon Williams review your chess games live? Leave a comment in this community post! - https://bit.ly/2WuKsvh
13
00:02:52,111 --> 00:02:57,111
Geshvad Nasiri: so will win again
14
00:03:09,174 --> 00:03:14,174
David Trottier: Are we going to see infamous aimchess ads throughout this tour?
15
00:03:21,281 --> 00:03:24,921
Rick Jena: I would rather choose lithium battery
16
00:03:24,921 --> 00:03:29,921
Karan Anvekar: Hi
17
00:03:37,584 --> 00:03:42,584
Alekhine 2255: guys will there be dubov in next tournament
This is just a very basic example, so I'm sure you could improve it. For example, I added a "maximum duration" (a message will not stay on the screen for more than 5 seconds), but you could improve it so that messages do not disappear too quickly.
That being said, if there is enough demand, or someone wants to improve upon this basic version, I'd be happy to add it to the software. For example, adding a --output chat.srt command.
Thanks for taking the time to respond. Unfortunately I do not know how to code but yes I would love to see srt or ass support if possible. It would be cool if the subtitle was formatted in danmaku/nico nico douga style too so multiple comments can be on the screen at the same time. Something like this https://github.com/m13253/danmaku2ass
I'm a japanese student and getting the chat into a subtitle format would help me and I'm sure people that want to archive vtubers and other japanese youtubers would be interested in this as well. Thanks again.
@Sephy1 Here's a script to download to ASS (supports scrolling comments and multiple comments onscreen at the same time) or SRT (does not support scrolling or multiple comments onscreen):
import argparse
import os
from chat_downloader import ChatDownloader
from typing import List
class ChatMessage:
    TimestampSeconds: float
    Author: str
    MessageText: str
    def __init__(self, timestamp_seconds: float, author:str, message_text: str) -> None:
        self.TimestampSeconds = timestamp_seconds
        self.Author = author
        self.MessageText = message_text
class SrtLine:
    Index: int
    StartTimeSeconds: float
    EndTimeSeconds: float
    Author: str
    MessageText: str
    def __init__(self, index: int, start_time_seconds: float, end_time_seconds: float, author: str, message_text: str) -> None:
        self.Index = index
        self.StartTimeSeconds = start_time_seconds
        self.EndTimeSeconds = end_time_seconds
        self.Author = author
        self.MessageText = message_text
    def __seconds_to_timestamp(self, seconds: float):
        int_seconds = int(seconds)
        h, remainder = divmod(abs(int_seconds), 3600)
        m, s = divmod(remainder, 60)
        milliseconds = round(1000 * (float(seconds) - int_seconds))
        return f"{'-' if seconds < 0 else ''}{h:02}:{m:02}:{s:02},{milliseconds:03}"
    def to_string(self) -> str:
        return f'{self.Index}\n{self.__seconds_to_timestamp(self.StartTimeSeconds)} --> {self.__seconds_to_timestamp(self.EndTimeSeconds)}\n<font color="#00FF00">{self.Author}</font>: {self.MessageText}\n\n'
assHeader = """[Script Info]
ScriptType: v4.00+
Collisions: Normal
PlayResX: 640
PlayResY: 480
Timer: 100.0000
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Myriad Web Pro Condensed,26,&H00ffffff,&H0000ffff,&H0025253a,&H96000000,0,0,0,0,100,100,0,0.00,1,2,1,2,15,15,20,1
[Events]
Format: Layer, Start, End, Style, Actor, MarginL, MarginR, MarginV, Effect, Text
"""
class AssLine:
    StartTimeSeconds: float
    EndTimeSeconds: float
    Author: str
    MessageText: str
    def __init__(self, start_time_seconds: float, end_time_seconds: float, author: str, message_text: str) -> None:
        self.StartTimeSeconds = start_time_seconds
        self.EndTimeSeconds = end_time_seconds
        self.Author = author
        self.MessageText = message_text
    def __seconds_to_timestamp(self, seconds: float):
        int_seconds = int(seconds)
        h, remainder = divmod(abs(int_seconds), 3600)
        m, s = divmod(remainder, 60)
        hundredths = round(100 * (float(seconds) - int_seconds))
        return f"{'-' if seconds < 0 else ''}{h:01}:{m:02}:{s:02}.{hundredths:02}"
    def to_string(self) -> str:
        fadeMilliseconds = round(1000 * (self.EndTimeSeconds - self.StartTimeSeconds) / 20)
        return f'Dialogue: 0,{self.__seconds_to_timestamp(self.StartTimeSeconds)},{self.__seconds_to_timestamp(self.EndTimeSeconds)},,,0000,0000,0000,,{{\\move(320,480,320,360)}}{{\\fad({fadeMilliseconds},{fadeMilliseconds})}}{{\\1c&H00FF00&}}{self.Author}: {{\\1c&HFFFFFF&}}{self.MessageText}\n'
def even_spaced_timestamp_filter(chat_messages: List[ChatMessage], smoothing_interval_seconds: float = 5):
    """Smooths out chat message timestamps within regularly-spaced intervals, so that timestamps are more evenly-spaced. This helps readability when bursts of several messages occur at nearly the same time."""
    if len(chat_messages) == 0:
        return
    if smoothing_interval_seconds <= 0:
        raise ValueError(f'smoothingIntervalSeconds must be positive, but was {smoothing_interval_seconds}')
    minIndex = 0
    maxIndex = -1
    minTimestamp = 0
    maxTimestamp = smoothing_interval_seconds
    lastTimestamp = chat_messages[-1].TimestampSeconds
    while minTimestamp < lastTimestamp:
        while maxIndex + 1 < len(chat_messages) and chat_messages[maxIndex + 1].TimestampSeconds < maxTimestamp:
            maxIndex += 1
        commentsInInterval = maxIndex - minIndex + 1
        if commentsInInterval > 0:
            for i in range(0, commentsInInterval):
                chat_messages[minIndex + i].TimestampSeconds = minTimestamp + (2 * i + 1) * smoothing_interval_seconds / (2 * commentsInInterval)
        minIndex = maxIndex + 1
        minTimestamp += smoothing_interval_seconds
        maxTimestamp += smoothing_interval_seconds
def parse_chat_messages(chats) -> List[ChatMessage]:
    chatMessages: List[ChatMessage] = []
    for chat in chats:
        messageText: str = chat['message']
        # Replace shorthand emotes, like :partying_face:, with UTF, like 🥳.
        emotes = chat.get('emotes')
        if emotes:
            for emote in emotes:
                utfId = emote['id']
                shortcuts = emote['shortcuts']
                # "Custom emojis" use sprite images, not UTF characters, and SRT cannot display images, so ignore these.
                isNotCustomEmoji = not emote['is_custom_emoji']
                if utfId and shortcuts and isNotCustomEmoji:
                    for shortcut in shortcuts:
                        messageText = messageText.replace(shortcut, utfId)
        chatMessages.append(ChatMessage(
            timestamp_seconds=chat['time_in_seconds'],
            author=chat['author']['name'],
            message_text=messageText))
    return chatMessages
def parse_srt_lines(chat_messages: List[ChatMessage], max_seconds_onscreen: float = 5) -> List[SrtLine]:
    if max_seconds_onscreen <= 0:
        raise ValueError(f'max_seconds_onscreen must be positive, but was {max_seconds_onscreen}')
    srtLines: List[SrtLine] = []
    for index, chatMessage in enumerate(chat_messages):
        nextTimestampSeconds = chat_messages[index + 1].TimestampSeconds if index + 1 < len(chat_messages) else float("inf")
        srtLines.append(SrtLine(
            index=index,
            start_time_seconds=chatMessage.TimestampSeconds,
            end_time_seconds=min(nextTimestampSeconds, chatMessage.TimestampSeconds + max_seconds_onscreen),
            author=chatMessage.Author,
            message_text=chatMessage.MessageText))
    return srtLines
def parse_ass_lines(chat_messages: List[ChatMessage], max_seconds_onscreen: float = 5, grouping_interval_seconds: float = 5, max_subtitles_onscreen: int = 5) -> List[AssLine]:
    if max_seconds_onscreen <= 0:
        raise ValueError(f'max_seconds_onscreen must be positive, but was {max_seconds_onscreen}')
    if grouping_interval_seconds <= 0:
        raise ValueError(f'grouping_interval_seconds must be positive, but was {grouping_interval_seconds}')
    if max_subtitles_onscreen <= 0:
        raise ValueError(f'max_subtitles_onscreen must be positive, but was {max_seconds_onscreen}')
    assLines: List[AssLine] = []
    if len(chat_messages) == 0:
        return assLines
    minTimestamp = 0
    maxTimestamp = grouping_interval_seconds
    lastTimestamp = chat_messages[-1].TimestampSeconds
    minIndex = 0
    maxIndex = -1
    while minTimestamp < lastTimestamp:
        while maxIndex + 1 < len(chat_messages) and chat_messages[maxIndex + 1].TimestampSeconds < maxTimestamp:
            maxIndex += 1
        commentsInInterval = maxIndex - minIndex + 1
        if commentsInInterval > 0:
            subtitlesPerSecond = commentsInInterval / grouping_interval_seconds
            for i in range(0, commentsInInterval):
                chatMessage = chat_messages[minIndex + i]
                timeOnscreen = min(max_subtitles_onscreen / subtitlesPerSecond, max_seconds_onscreen)
                assLines.append(AssLine(
                    start_time_seconds=chatMessage.TimestampSeconds,
                    end_time_seconds=chatMessage.TimestampSeconds + timeOnscreen,
                    author=chatMessage.Author,
                    message_text=chatMessage.MessageText))
        minIndex = maxIndex + 1
        minTimestamp += grouping_interval_seconds
        maxTimestamp += grouping_interval_seconds
    return assLines
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--url', required=True)
    parser.add_argument('--max_seconds_onscreen', required=False, default=5)
    parser.add_argument('--smoothing_interval_seconds', required=False, default=5)
    parser.add_argument('--title', required=False, default='subtitles')
    subparsers = parser.add_subparsers(dest='command', required=True)
    parser_srt = subparsers.add_parser('srt')
    parser_ass = subparsers.add_parser('ass')
    parser_ass.add_argument('--max_subtitles_onscreen', required=False, default = 5)
    args = parser.parse_args()
    chatMessages = parse_chat_messages(ChatDownloader().get_chat(args.url))
    even_spaced_timestamp_filter(chatMessages, args.smoothing_interval_seconds)
    
    if args.command == 'srt':
        lines = parse_srt_lines(chatMessages, args.max_seconds_onscreen)
    elif args.command == 'ass':
        lines = parse_ass_lines(chatMessages, args.max_seconds_onscreen, args.smoothing_interval_seconds, args.max_subtitles_onscreen)
    filePath = os.path.join(os.getcwd(), f'{args.title}.{args.command}')
    with open(filePath, 'w', encoding='utf-8') as file:
        if args.command == 'ass':
            file.write(assHeader)
        for line in lines:
            file.write(line.to_string())
        print(f'Wrote subtitles to {filePath}')
It's similar to xenova's example above, with a few changes:
- Fixes a bug where emojis are left in their raw input form, like :partying_face:, instead of 🥳.
- Fixes a bug where messages whose milliseconds timestamp ended with 1 or 2 zeros (such as 32.500seconds) were output with trailing zeros removed, which causes them to appear onscreen longer than they should.
- Adds an evenly-spaced timestamp filter, so that when several people post comments at the same time, the spike of comments gets smoothed out over time, which makes comments easier to read.
To use it, run pip install chat-downloader if you haven't already, save the script to a file like srt_subtitle_downloader.py, and then run it:
python srt_subtitle_downloader.py --url https://www.youtube.com/watch?v=k-S4ZRlMf6Q ass
If you run python srt_subtitle_downloader.py --help, it'll print out a help menu. The ASS scrolling is vertical (messages start at bottom of screen, and move upwards), whereas Nico-Nico is horizontal (messages start at right of screen and move left). You can tweak the code to make them scroll horizontally, though.
@xenova It might be cool to add a --output chat.srt option to chat-downloader itself as you suggested, but the even_spaced_timestamp_filter requires having all comments upfront, which doesn't work with the continuous_write streaming abstraction, which assumes output can be done comment-by-comment. Any thoughts? Possible solutions:
- Add an special case if --outputends with.srtthat downloads all the chat, and then writes them to SRT in one go. This might be a little ugly, but would sidestep the problem.
- Change continuous_write.pyto accept the full list of chat comments. This would be a breaking change in behavior, since the point ofcontinuous_write.pywas to support streaming comments as they are downloaded.
- Remove the even_spaced_timestamp_filter. This would allow using the existing output streaming interface, but it makes subtitles harder to read.
Oh hey, I made this! Check out https://github.com/9001/softchat
EDIT: The current output modes, -m1 and -m2, are designed for high-speed chats (10+ messages per second) -- I'll add a more conventional subtitle view, -m3, for slower chats :>