Auto-Stream-Recording-Twitch
Auto-Stream-Recording-Twitch copied to clipboard
VOD download is sometimes incomplete due to Twitch batch processing
From some limited experimental observation, it seems that segments are added to the VOD's HLS playlist in batches of approximately 30 segments, and the time between batches is more or less the span of time those segments represent (e.g. if the segments are each 10 seconds, a new batch will be added every ~300 seconds). Note that these segments are not equivalent to the live segments, as those are typically shorter to reduce latency.
Since the script downloads the VOD immediately after the livestream terminates, the resulting VOD file often (and in my experience, usually) is missing a few minutes at the end because the VOD was fetched before the final batch of segments was added. Unfortunately, there doesn't seem to be an API call that would tell us when the VOD has been finalized, but by simply adding an --hls-playlist-reload-attempts
parameter, we could ensure with reasonable certainty that the VOD has been finalized before terminating the download at the expense of ~6-12 minutes of additional runtime for that streamlink process as it checks for a new playlist (~6 minutes if a final batch isn't added, ~12 minutes if it is).
Note: if streamlink checks continuously as segments are downloaded (which I suspect it does--still need to test this), those runtime figures will actually be lower or drop to 0 depending on the user's download speed. For example, if a user saturates their 300 mbps connection downloading a 15 GB VOD, it will take (15000 MB/(300/8)) = 400 seconds to download, resulting in no perceived delay if a final batch isn't added or only ~4 minutes of delay if one is.
Again, based on experimentation, the batch adding seems to happen fairly reliably in 25 to 26 target duration periods (which is the default setting for --hls-playlist-reload-time
, as opposed to segment
). You could give this a bit of leeway--maybe set the attempt count to 30 or so--and I think this would solve the problem, for the most part. May require some additional testing, as most of the streams I looked at had 10 second segments and I'm not sure 1. how often different segment sizes are used for VODs, 2. what effect this might have on batch frequency, if any, and 3. if the final batch (as opposed to regular batches added to the VOD as a stream is still live, which is necessarily most of my data) takes a different amount of time to process.
The documentation for the two mentioned streamlink parameters, for reference:
--hls-playlist-reload-attempts ATTEMPTS
How many attempts should be done to reload the HLS playlist before
giving up.
Default is 3.
--hls-playlist-reload-time TIME
Set a custom HLS playlist reload time value, either in seconds
or by using one of the following keywords:
segment: The duration of the last segment in the current playlist
live-edge: The sum of segment durations of the live edge value minus one
default: The playlist's target duration metadata
Default is default.
Hello.
Thanks for detailed issue!
I know about this problem with VODs, but VODs aren't main target of this script.
Is your solution to add a delay before the end of recording the stream? If so, then if the streamer restarts the stream, then the beginning of the next stream will be cut off. I don't think it's good enough.
Kraken API request actually does has "status":"recording"
field, but Kraken is deprecated API. Also I don't think that I should make such a delay to wait when VOD status will be recorded
.
I gonna check this later but now I'm working a lot. I appreciate any help!
Just to be clear, I mean that when I have VOD download set to 1, a separate streamlink process launches that downloads the VOD, and that process usually doesn't get the full VOD. Recording the livestreams works totally fine, and I'm not proposing any change to that. It's unfortunate that the new Twitch API seems to be missing so many useful features, seemingly by design.
I actually just tested my solution and...it didn't work. The problem is, since the VODs are not live streams, they have #EXT-X-ENDLIST
as the final line in the HLS playlist. Regardless of what reload settings you have, streamlink will download the playlist and terminate without ever attempting to reload, since, as far as it knows, there will never be any new data. And unfortunately, there doesn't seem to be any streamlink argument that allows you to ignore the endlist metadata (makes sense, since this would be operating contrary to the HLS specification).
Barring adding a delay before starting the download (which I agree is not a good solution), the only other solution that comes to mind would be manually removing the last line of the playlist before passing it to streamlink so that it treats it like a livestream. This would only allow one reload (since the unmodified playlist that will be received after reloading will still end with #EXT-X-ENDLIST
), so we would have to manually set a high reload time of perhaps 6 or 7 minutes to make sure that this works. However, even though this is conceptually a simple (if very hacky) solution, I'm not sure how you could intercept the playlist to accomplish this or use a local modified playlist with streamlink.
If that hack isn't feasible (I'll try to play around and see how it might be accomplished), solving this is probably out of scope for this script. In that case, I'll write a script separately that suits my needs. Since all streamlink is doing is concatenating the TS files together and obtaining the static HLS playlist URL for a VOD is trivial, I could imagine a very simple script that downloads the current contents of the HLS playlist, checks periodically for an updated playlist (for Twitch the last-modified header is a reliable indicator), and downloads only the new segments, timing out after a certain number of checks without finding new segments. Besides fixing the issue of getting an incomplete VOD as in this script, it would also allow for VOD downloading to occur in parallel with recording a stream, which would avoid the prolonged spike in bandwidth consumption associated with downloading the entire VOD in one go.
Sorry for the long comment again, and thanks so much for creating and maintaining this script! It's really made my life a lot easier, and without it, there's definitely some streams that I would have lost completely.
Isn't it easier just to add delay in stream downloading process?
That's probably the more straightforward way. It may be a good idea to grab the HLS playlist URL first before delaying and pass that to streamlink (instead of the twitch.tv/videos/[vod_id] link as is done currently) in case the VOD is deleted from Twitch during the delay, since the playlist and segments are still available for a good while after the VOD has been deleted.
I thought about it too. It's pretty complicated way tho 😂