twitch-dl
twitch-dl copied to clipboard
[Feature Request] Download clips by Stream ID
Hey, so with all this DMCA stuff going on many streamers nuked their clips. However, it's still possible to download them by using a Stream ID like this: https://pastebin.com/H25NPesp Could it be possible to implement such a useful feature? Check stream IDs on this website: twitchtracker.com
Looking into it.
@ihabunek here's an example of stream ids I've scraped from a certain streamer: test.txt
How do you determine which offsets to scan? I found clips at pretty high offsets, 5000+.
If I understand it right, Rydan's solution on twitter works like this: 'Take that id and then loop through every offset until stream end (for a 1h long stream that would mean offset 1 to 3600) and then request each URL via get request and check the response for a status code of 200. All the ones with that status code are existing clips that can be downloaded.'
I don't want to add this to twitch-dl until I fully understand the logic so I've made a new project to play around with it.
You need python3.5+. Download the package here.
To test, try this out:
python3 ./clips-dl.0.2.0.pyz 787034579 --min-offset 2400 --max-offset 3300 > urls.txt
Note that 787034579
is the video ID, NOT the stream ID. I fetch the stream ID from twitch.
urls.txt
should contain URLs of clips in given offset range. You can download them with wget or similar (I get five of them for the above command).
wget -i urls.txt
You can leave out the min and max offset, and they will be set to 0 and video length in seconds respectively.
hmm, tried it out with this vod:
python3 ./clips-dl.0.2.0.pyz 798052582 --min-offset 2400 --max-offset 3300 > urls.txtINFO:__main__:Fetching stream id for video ID: 798052582
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "./clips-dl.0.2.0.pyz/__main__.py", line 108, in <module>
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "./clips-dl.0.2.0.pyz/__main__.py", line 79, in main
File "./clips-dl.0.2.0.pyz/__main__.py", line 43, in get_video_info
IndexError: list index out of range
Works with your example, though
same output with a different vod without offset parameters:
python3 ./clips-dl.0.2.0.pyz 798052582 > urls.txt
INFO:__main__:Fetching stream id for video ID: 798052582
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "./clips-dl.0.2.0.pyz/__main__.py", line 108, in <module>
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "./clips-dl.0.2.0.pyz/__main__.py", line 79, in main
File "./clips-dl.0.2.0.pyz/__main__.py", line 43, in get_video_info
IndexError: list index out of range
I think i know what the issue is, but i don't have any more time to work on this today. Paid job beckons. I'll get around to this later.
Okay, thanks for helping out! I'm currently playing around with getting stream ids from old vods which are gone to scan them for clips which are still there. Is there any way to convert an old vod id (e.g. 78349737) into a stream id? For me it returns {'data': {'video': None}
.
Despite not knowing python, I tried to debug the clips-dl bundle you've provided. It appears that it didn't work with 798052582 because the stream was ongoing. I've also made a mistake and checked this vod id twice, was a little distracted. The response didn't contain the stream id:
INFO:__main__:Fetching stream id for video ID: 798052582
INFO:__main__:{'data': {'video': {'previewThumbnailURL': 'https://vod-secure.twitch.tv/_404/404_processing_{width}x{height}.png', 'lengthSeconds': 7799}}, 'extensions': {'durationMilliseconds': 17, 'requestID': '01EPSD3A4ZTNX9SDNY89JMPWDQ'}}
So that wasn't an issue, sorry. Also a bunch of errors on an unstable connection but it's not a big deal.
DEBUG:__main__:GET https://clips-media-assets2.twitch.tv/1603370954-offset-10219.mp4
Traceback (most recent call last):
File "clip-dl.pyz/httpx/_exceptions.py", line 326, in map_exceptions
File "clip-dl.pyz/httpx/_client.py", line 1502, in _send_single_request
File "clip-dl.pyz/httpcore/_async/connection_pool.py", line 218, in arequest
File "clip-dl.pyz/httpcore/_async/connection.py", line 105, in arequest
File "clip-dl.pyz/httpcore/_async/http11.py", line 72, in arequest
File "clip-dl.pyz/httpcore/_async/http11.py", line 133, in _receive_response
File "clip-dl.pyz/httpcore/_async/http11.py", line 172, in _receive_event
File "clip-dl.pyz/httpcore/_backends/asyncio.py", line 150, in read
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "clip-dl.pyz/httpcore/_exceptions.py", line 12, in map_exceptions
httpcore.ReadError: [Errno 104] Connection reset by peer
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "clip-dl.pyz/__main__.py", line 108, in <module>
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "clip-dl.pyz/__main__.py", line 87, in main
File "clip-dl.pyz/__main__.py", line 73, in find_clips
File "clip-dl.pyz/__main__.py", line 52, in process_segment
File "clip-dl.pyz/httpx/_client.py", line 1602, in head
File "clip-dl.pyz/httpx/_client.py", line 1371, in request
File "clip-dl.pyz/httpx/_client.py", line 1406, in send
File "clip-dl.pyz/httpx/_client.py", line 1444, in _send_handling_auth
File "clip-dl.pyz/httpx/_client.py", line 1476, in _send_handling_redirects
File "clip-dl.pyz/httpx/_client.py", line 1502, in _send_single_request
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "clip-dl.pyz/httpx/_exceptions.py", line 343, in map_exceptions
httpx.ReadError: [Errno 104] Connection reset by peer
DEBUG:__main__:GET https://clips-media-assets2.twitch.tv/1603370954-offset-17993.mp4
Traceback (most recent call last):
File "clip-dl.pyz/httpx/_exceptions.py", line 326, in map_exceptions
File "clip-dl.pyz/httpx/_client.py", line 1502, in _send_single_request
File "clip-dl.pyz/httpcore/_async/connection_pool.py", line 218, in arequest
File "clip-dl.pyz/httpcore/_async/connection.py", line 105, in arequest
File "clip-dl.pyz/httpcore/_async/http11.py", line 72, in arequest
File "clip-dl.pyz/httpcore/_async/http11.py", line 133, in _receive_response
File "clip-dl.pyz/httpcore/_async/http11.py", line 172, in _receive_event
File "clip-dl.pyz/httpcore/_backends/asyncio.py", line 150, in read
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "clip-dl.pyz/httpcore/_exceptions.py", line 12, in map_exceptions
httpcore.ReadTimeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "clip-dl.pyz/__main__.py", line 108, in <module>
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "clip-dl.pyz/__main__.py", line 87, in main
File "clip-dl.pyz/__main__.py", line 73, in find_clips
File "clip-dl.pyz/__main__.py", line 52, in process_segment
File "clip-dl.pyz/httpx/_client.py", line 1602, in head
File "clip-dl.pyz/httpx/_client.py", line 1371, in request
File "clip-dl.pyz/httpx/_client.py", line 1406, in send
File "clip-dl.pyz/httpx/_client.py", line 1444, in _send_handling_auth
File "clip-dl.pyz/httpx/_client.py", line 1476, in _send_handling_redirects
File "clip-dl.pyz/httpx/_client.py", line 1502, in _send_single_request
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "clip-dl.pyz/httpx/_exceptions.py", line 343, in map_exceptions
httpx.ReadTimeout
Everything works very well on a good connection, albeit only for one vod at a time... I hope this isn't asking for too much, but could there be an option to pass an array of stream ids from a txt file in which every stream id is on a new line? With the program writing outputs to urls-{stream_id}.txt. Tried to do it myself but couldn't figure it out, would really appreciate help with this.
@Daniil1288 I added some retries and hopefully fixed the stream_id parsing logic. https://git.sr.ht/~ihabunek/clips-dl/refs/0.2.1/clips-dl.0.2.1.pyz
About your request, I can do that. Could you also provide the stream duration? I don't have a simple way to get stream duration from a stream_id.
BTW, have you noticed that all clips downloaded this way have a duration of 30 seconds? Also I'm getting some some duplication, e.g. in the example i sent before i get these clips:
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-2496.mp4
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-2694.mp4
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-2698.mp4
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-3078.mp4
INFO:clips-dl:Found clip: https://clips-media-assets2.twitch.tv/40284574190-offset-3244.mp4
Clips at offsets 2694 and 2698 overlap.
Hey @ihabunek, I guess it would be easier to make it an argument that converts hours into seconds. In my case, the average stream length is 8 hrs, so I have to check offsets from 1 to 28800. The only way to get the length accurately would be to use a third-party website like twitchtracker.com which is where I got my stream IDs from as their https://twitchtracker.com/{streamername}/streams page gives all the required information just like this in one request for all streams for any streamer on the website:
<div class="hidden text-uppercase">
| <span class="to-date">2016-11-17</span>
| <br>Live for 7 hour(s)
| <br>Viewers: <span class="to-number">18928</span>
| <br><small class="text-warning">click to show details</small>
</div>
The clips downloaded this way have a duration of 30 seconds because this is how non-customized clips are stored. Customized clips with custom length have this format: AT-cm_363621067 but I don't think there's a way to bruteforce these easily if at all since the number looks to be random and isn't associated with any streamer. Here's Rydan explaining it on twitter. As he says in that conversation, there's also a third type of a clip link which has /raw_media/ in it which is formed when you create a clip. It's the 90 second version which you crop when customizing a clip. Here's an example: https://clips-media-assets2.twitch.tv/raw_media/39947855852-offset-10942.mp4 - just made one now. Haven't been able to find working ones like this on the internet, though... might play around with it to see if it's possible to scan for a 'raw media' version on old streams, too.
The way you get the stream id which you've called 'hacky' in the recent commit is actually perfectly normal, that's how most twitch clip downloaders used to work iirc, i.e. by splitting the thumbnail URL. In fact, when I was downloading what's left of my clips collection over at https://dashboard.twitch.tv/u/daniil1288/content/clips, as there was no button or program online to actually download all the clips in one go if you're a viewer - which is why I've been putting off doing it until it was too late as most of them got nuked - I had to scroll down until it loaded about 950 clips and refused to go any further, save the html, then reverse the 'Created' order, load 950 clips and save the html again, split lines so I'd get a list of thumbnails like https://clips-media-assets2.twitch.tv/vod-85620741-offset-17528-preview-260x147.jpg and turn this into https://clips-media-assets2.twitch.tv/vod-85620741-offset-17528.mp4 only to then finally throw the links in JDownloader to get them all. Thankfully I had only 1.7k clips left, not sure if it's even possible to load clips past the 950-1000 mark... not taking into account that the website lagged and often refused to load more than 100 or 150 clips at once. Also, there are clips out there (not deleted ones) which don't have a thumbnail for some reason and return https://static-cdn.jtvnw.net/ttv-static/404_preview-160x90.jpg, just thought I'd let you know. Twitch is weird. Anyway, thanks for taking your time to help me out with this! The bruteforce actually helps a lot for those who watch smaller (sub 1k viewers) streamers who had to delete all clips and didn't back up the years of content they've produced on the platform either.
https://git.sr.ht/~ihabunek/clips-dl/refs/0.3.0/clips-dl.0.3.0.pyz
I made searching for clips for the whole channel.
λ ./clips-dl.0.3.0.pyz channel -h
usage: clips-dl channel [-h] [-s SKIP] [-w WORKERS] [-v] [-q] channel_name
positional arguments:
channel_name Channel name
optional arguments:
-h, --help show this help message and exit
-s SKIP, --skip SKIP Number of streams to skip (default 0)
-w WORKERS, --workers WORKERS
Number of concurrent downloads (default: 25)
-v, --verbose Verbose logging
-q, --quiet Disable logging
For example:
λ ./clips-dl.0.3.0.pyz channel bananasaurus_rex > urls.txt
Still just outputs urls, does not download.
You can still download for a single video id:
λ ./clips-dl.0.3.0.pyz video 12345
Tips:
- use -v to show each URL checked.
- use -w to teak number of concurrent downloads, optimal value may depend on your network connnection
- use -s to skip a number of streams, e.g. if the program breaks and you want to resume where you left off
You've hardcoded the bananasaurus_rex argument by mistake, I think:
streams = await find_streams(client, "bananasaurus_rex")
Otherwise works fine after I changed that, thanks a lot! Started getting 503 at some point, though, which is unhandled and throws an exception. Had to decrease the amount of workers to 20 or 15 but still getting it. Could you handle that exception so it retries if it encounters 503? Would also be cool to be able to specify the timeout and the amount of retries by passing an argument.
503'd on the fourth stream after letting it run for an hour with -w 10.
LOL about hardcoded channel. I'll handle the exception. Don't know if I'll have time today. Probably over the weekend.
Hello, hope you're doing well, is there a possibility you'll handle the exception any time soon? It's very usable if it weren't for it crashing every now and then.
@Daniil1288 Sorry, got distracted by other stuff. Here's a new version which should solve the retries (retry on any error, not only timeout), removes the hardcoded channel name and adds options for retries and timeout.
https://git.sr.ht/~ihabunek/clips-dl/refs/0.4.0/clips-dl.0.4.0.pyz
λ ./clips-dl.0.4.0.pyz channel --help
usage: clips-dl channel [-h] [-s SKIP] [-w WORKERS] [-v] [-q] [-t TIMEOUT] [-r RETRIES] channel_name
positional arguments:
channel_name Channel name
optional arguments:
-h, --help show this help message and exit
-s SKIP, --skip SKIP Number of streams to skip (default 0)
-w WORKERS, --workers WORKERS
Number of concurrent downloads (default: 25)
-v, --verbose Verbose logging
-q, --quiet Disable logging
-t TIMEOUT, --timeout TIMEOUT
HTTP timeout in seconds (default: 10
-r RETRIES, --retries RETRIES
How many times to retry failed requests (default: 5
Hey, ran it for a while, it managed to do thirty or so vods and then errored out with this:
Traceback (most recent call last):
File "C:\Users\...\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\...\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\clips\clips-dl.0.4.0.pyz\__main__.py", line 225, in <module>
File "C:\Users\...\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "C:\clips\clips-dl.0.4.0.pyz\__main__.py", line 178, in channel
File "C:\clips\clips-dl.0.4.0.pyz\__main__.py", line 131, in find_clips
File "C:\clips\clips-dl.0.4.0.pyz\__main__.py", line 117, in process_segment
ValueError: Unhandled HTTP status: 500
Twitch was down for a while, this is why. Kind of a shame that it also didn't save the output anywhere after erroring out. Interestingly enough, I've also found that my old clip urls that I grabbed using version 0.2.0 don't work any more, instead displaying AccessDenied... hopefully twitch didn't patch this, I'll report back once the script gets to older vods.
Okay, it still works, it's just that some older clips got deleted, it looks like. Still, for me it errors out on http 500 now after a while, just let it run again and it happened again... without saving the output anywhere. Could you handle http 500 too? So it retries once it encounters this error. Thanks!