ytarchive icon indicating copy to clipboard operation
ytarchive copied to clipboard

Resumable Downloads

Open Kethsar opened this issue 4 years ago • 12 comments

Just write the most recently grabbed audio and video fragments to a file lol Probably .<videoid>.state or something in the final dir the download will be placed in. Also put the tmpdir name in it I guess.

Kethsar avatar Jun 06 '21 01:06 Kethsar

Is there any progress on this front? With my unreliable internet connection, downloads dropping due to connection failures is almost a daily occurrence; even a 15s outage will cause the program to terminate. After this it's always stressful whether a 2nd instance will be able to catch up in time if a stream is not going to be archived (the primary use case for the program).

If getting it to restart at the correct point automatically is too difficult, an option to begin downloading at a specified audio and video segment would be almost as good for me personally. I don't mind extra work muxing afterward (there is unlimited time to do so), but I don't want to spend valuable time while a stream is live re-downloading the beginning and not managing to catch up before it gets privated.

fren-archivist avatar Aug 29 '21 16:08 fren-archivist

I wouldn't call it hard at all, I am just a lazy fuck. I don't plan to add any new features until I finish the golang rewrite, and I have been hard slacking on that. Sorry.

Kethsar avatar Aug 29 '21 18:08 Kethsar

Isn't the go rewrite finished now? Would love to have this feature.

archiif avatar Apr 07 '22 18:04 archiif

It's been done for a while now, yes. I've been pondering this as well, and I might instead try to make it so it won't stop if errors are happening from timeouts when you've lost internet connection, or something. Will likely still do the original plan as well regarding a state file, but the tricky part is what happens if you lose power to your computer while downloading. If a fragment was in the middle of being written to the main file, resuming will end up with a corrupted file that has a partial fragment. I am uncertain if I can detect and fix that off hand, either.

Kethsar avatar Apr 07 '22 18:04 Kethsar

@Kethsar It might work if you wait for the fragment to finish downloading, fsync it, and then write to the state file?

TheTechRobo avatar Apr 07 '22 20:04 TheTechRobo

@TheTechRobo not unless that magically makes it so all the data would somehow be there if power was cut mid-operation. I don't think there is any way to guarantee data is written if your computer loses power at any point in the process as it's not an instant operation.

Kethsar avatar Apr 07 '22 20:04 Kethsar

Ohhh, I didn't realise you were meaning writing to the main file. That's definitely tricky. 🤔

TheTechRobo avatar Apr 07 '22 20:04 TheTechRobo

What about a mode where the pieces don't get merged until the final step? This way, it should be easy to detect any incomplete pieces and redownload them.

archiif avatar Apr 08 '22 15:04 archiif

Ah yes, I pretty much always use the flag for in-memory fragments to lessen disk reads/writes. I supposed merging all file fragments at the end is the default for youtube-dl isn't it? I never liked it personally since it would leave a mass amount of files behind if something went wrong, but I suppose I can do that for resumability.

Kethsar avatar Apr 08 '22 18:04 Kethsar

If a fragment was in the middle of being written to the main file, resuming will end up with a corrupted file that has a partial fragment

Isn't this simple to solve? It seems like you'd just need to save the size of each transport stream to the state file along with the fragment number. Then when you try to resume it the program truncates any extra data at the end before it starts downloading and concatenating.

fren-archivist avatar May 26 '22 19:05 fren-archivist

That is a good point and has a decent chance of working in most cases. Noted.

Kethsar avatar May 26 '22 21:05 Kethsar

Unless I'm missing something, the only case I think you'd run into trouble with this (assuming you can trust the storage device and filesystem and no other programs mess with the files) is if the power loss was specifically when writing the state file instead of the fragments. That's quite rare, but is also something you can work around just by using redundancy.

First off, keep separate filename.f140.state and filename.f###.state files for the audio and video to avoid any weird race conditions. When writing a new state file for either format, first move the old state file to X.state.safe, then write the new state file to X.state, then delete X.state.safe. When you try to resume a download, first try from X.state.safe. If that file is not present (which should be most of the time) then you can conclude X.state was successfully fully written and resume based on that. Any performance implications should be negligible because this is all happening in the merger threads which are mostly just sitting and waiting anyway. If X.state.safe is present but X.state was fully written, you will end up redownloading one extra fragment, which is not a concern.

fren-archivist avatar May 26 '22 21:05 fren-archivist

you can use sqlite for the state file, it is more reliable than fwrite. insert metadata for every piece into the state, on resume verify metadata and truncate file if needed

to avoid cgo sqlite there is: https://github.com/dgraph-io/badger or https://github.com/etcd-io/bbolt

ghost avatar Mar 08 '23 17:03 ghost