libtorrent icon indicating copy to clipboard operation
libtorrent copied to clipboard

Rethink "resume data" behavior

Open glassez opened this issue 3 years ago • 36 comments

Currently libtorrent sees the resume data only as a hint that can save some of the work done before. If something is wrong with it, libtorrent blindly discards it and starts from scratch, as for a newly added torrent. This looks extremely illogical from user side. Resume data is considered as an opportunity to logically extend the session lifetime to several consecutive "physical" sessions. Therefore, the resume data must be processed in a different way. To begin with, any error in their format/layout should lead to the failure of adding the torrent (for example, if some progress is set but there is no metadata, or if the size of the array of downloaded pieces contradicts the metadata, then this clearly indicates errors on the calling side, so libtorrent should not just swallow it). If resume data is provided and it has a consistent format, then it should be taken as a basis regardless of other circumstances. That is, if there is some progress, the torrent should be created (restored) with exactly this progress, and only then libtorrent should check it (for existing files etc.) and treat the errors as if they occurred during its normal processing (well, maybe a little differently, the main thing is that it should not reset the specified progress without an explicit call from the user side), i.e. just pause the torrent and set the error state.

glassez avatar Sep 18 '20 07:09 glassez

@Chocobo1, please, participate too.

glassez avatar Sep 18 '20 08:09 glassez

it sounds like you're asking that the "check-resume" step not validate the resume data against files on disk, but only for its internal consistency.

It sounds reasonable. To preserve backwards compatibility, it could be controller by a flag. Perhaps two flags, like this:

  • dont_verify_files - don't look at the disk when verifying resume data (i.e. don't really verify it)
  • strict_consistency - be strict when interpreting the resume data and any internal inconsistency is a hard error

arvidn avatar Sep 19 '20 08:09 arvidn

it sounds like you're asking that the "check-resume" step not validate the resume data against files on disk

Not exactly. I think it still makes sense to validate the resume data against files on disk. The main difference from the current behavior is that it should not reset progress in case of failure, but only set an error status and pause the torrent. The main use case is saving torrents to an external/network disk. The user starts the app and the disk is not connected for some reason, so the user sees their torrents in an error state, then it connects the disk and resumes their torrents, without losing the current progress and without rechecking.

To preserve backwards compatibility, it could be controller by a flag.

Ok.

Perhaps two flags

I would make strict checking for consistency to be mandatory.

glassez avatar Sep 19 '20 12:09 glassez

it sounds like you're asking that the "check-resume" step not validate the resume data against files on disk

Not exactly. I think it still makes sense to validate the resume data against files on disk. The main difference from the current behavior is that it should not reset progress in case of failure, but only set an error status and pause the torrent.

I think in practice it's the same thing. Just as if you would unmount the drive while running, the error would happen the next time the file is attempted to be accessed. It would be a bit simpler to implement, to defer the disk error until it's actually accessed.

arvidn avatar Sep 19 '20 14:09 arvidn

actually. a flag that means "don't validate any of the resume data against files on disk" would be sufficient for your use case, I think. It would have very similar behavior as:

  • stopping the torrent
  • unmounting the disk
  • resume the torrent

libtorrent would still assume the files were there when the torrent is resumed, but no error would happen until a peer requested data that have to be read from disk. When such error happens, it doesn't clear any state about the torrent, it just stops it and sets it in an error state.

The only meaningful thins I can think of a second flag could do would be to ensure that the file priority vector has a valid number of slots in it, or the have_pieces vector, or the info-hash matches the torrent_info. I don't really see how any of those things are particularly important. Currently they are just treated leniently, and just "fixed-up" and ignored essentially.

My impression is that the main feature you're looking for is ti stop the torrent if the files are missing, for which the first flag would be sufficient.

arvidn avatar Sep 20 '20 13:09 arvidn

would you mind testing this? https://github.com/arvidn/libtorrent/pull/5173

arvidn avatar Sep 20 '20 20:09 arvidn

The only meaningful thins I can think of a second flag could do would be to ensure that the file priority vector has a valid number of slots in it, or the have_pieces vector, or the info-hash matches the torrent_info. I don't really see how any of those things are particularly important. Currently they are just treated leniently, and just "fixed-up" and ignored essentially.

My impression is that the main feature you're looking for is ti stop the torrent if the files are missing, for which the first flag would be sufficient.

In fact, these are two different issues. And the second one is also very important. As I mentioned above, inconsistent resume data in the add_torrent_params clearly indicates some serious problem in the application workflow (for example, a logical error in the code, data corruption on disk, etc.). So it's a very bad idea to just ignore it. This behavior can only be justified if this inconsistency is acceptable under normal operating conditions of the application and you are sure that the fallback really suits users in this case. But even if some other library clients need the existing behavior, qBittorrent is clearly not one of them. So please consider the question about the solution of this problem at least optional via a flag.

would you mind testing this? #5173

Of course. I'll do it in a next few days.

glassez avatar Sep 21 '20 10:09 glassez

Currently it's possible to specify the piece- and file priorities for more pieces and files than there are in the torrent. This is specifically to support affecting those priorities for magnet links, before it's know how many files or pieces there are. So I think at least those have to have a "forgiving" treatment.

The info-hash matching is mandatory, I'm pretty sure.

Any issue in the bencoded structure is handled before the add_torrent_params is added, so those errors would happen earlier.

The remaining issues would be in add_torrent_params consistency. Maybe that only valid priority values are specified.

arvidn avatar Sep 22 '20 05:09 arvidn

Currently it's possible to specify the piece- and file priorities for more pieces and files than there are in the torrent. This is specifically to support affecting those priorities for magnet links, before it's know how many files or pieces there are. So I think at least those have to have a "forgiving" treatment.

Theoretically, it may have some meaning. But I can't imagine its practical use. In any case, it can be treated as a special case (i.e., don't check priorities if there is no metadata).

The remaining issues would be in add_torrent_params consistency. Maybe that only valid priority values are specified.

I care more about downloaded pieces. This should definitely cause an error if they are set without metadata, or if their layout contradicts the metadata.

glassez avatar Sep 22 '20 11:09 glassez

actually. a flag that means "don't validate any of the resume data against files on disk" would be sufficient for your use case, I think. It would have very similar behavior as:

  • stopping the torrent
  • unmounting the disk
  • resume the torrent

libtorrent would still assume the files were there when the torrent is resumed, but no error would happen until a peer requested data that have to be read from disk. When such error happens, it doesn't clear any state about the torrent, it just stops it and sets it in an error state.

The only thing that confuses me is the stopped torrents. What if all torrents from a detached disk are stopped? Then the user will not know about this fact. On the other hand, this can be considered a feature. The user can store their archived torrents on removable disks and not care whether they are attached or not. If he wants to resume one of them, he just gets an error message. Personally, I like it.

glassez avatar Sep 22 '20 11:09 glassez

I hadn't thought of that use case. I agree

arvidn avatar Sep 22 '20 11:09 arvidn

would you mind testing this? #5173

Well, I tested it a bit. It fulfill its main purpose, i.e. it does not reset progress if previously downloaded files are missing when restoring the session. In all other respects it behaves as if the session really was not interrupted. For me personally, this seems quite logical, but I'm not sure about our users (and other contributors). They may want it to actually be treated as a special case, i.e. immediately go into an error state. @Chocobo1, @thalieht, @FranciscoPombal? But even if it is really needed, it seems to me that with this flag, we can implement the rest at the qBittorrent level (for example, initiate reading of some piece that is supposed to be already downloaded, or something else).

Just for fun, how is it supposed to behave if a new torrent is added? Can it be useful for any purpose?

glassez avatar Sep 22 '20 20:09 glassez

Not sure how I feel about this yet (especially the part about resetiing/not resetting the progress in the different circumstances).

But consider the the following case:

1- Download a torrent to external drive 2- pause torrent 3- close client 4- disconnect external drive 5- start client

After step 5, I would expect a "missing files" error. If qBittorrent does not check paused torrents for errors on startup, that places the burden of ensuring integrity on the user. For example, suppose I'm running qBittorrent via some service manager that auto-restarts it in case of a crash, and I add a torrent, finish downloading it, and let it sit in the client for some days weeks. Then, one day I decide I am going to remove the torrent from the client and move the payload somewhere else. But, suppose that some time before I actually do that, qBittorrent crashes and restarts, and the file becomes slightly corrupted in the process. If qBittorrent doesn't validate fastresumes on start, I'll end up with bad data in these situations, unless I force myself to force recheck every torrent immediately before removing them from the client. I don't think that's desirable.

The problem with this is that it makes working with external drives very annoying. If you forget to plug it in, all torrents are immediately FUBAR and need to be fully rechecked the next time the drive is connected (many users complain about this all the time). So perhaps there should be another state besides "paused" named "frozen", in which such checks would be skipped. The new name clearly communicates to the user the different guarantees and mechanics. This state would enable the "archiving feature", that has been asked for at least a couple of times. qBittorrent would effectively become capable of acting like an "archive" of torrents.

FranciscoPombal avatar Sep 22 '20 23:09 FranciscoPombal

@FranciscoPombal Your comment clearly shows that you are not familiar with qBittorrent internals and its main difficulties in interacting with libtorrent. The loss of current progress when running with inaccessible disks is one of the long-standing problems that we have been struggling with for many years, only recently achieving at least some more or less satisfactory result. But it still can't be called either the best or reliable, because it's just another workaround using resume data substitution, etc. So this change is good in itself. I was just asking about related aspects. Should it not check for files at all at startup and continue running as if the session didn't really break, and only show errors when they actually occur (for example, when trying to read or write a disk), as @arvidn suggested. Or, as I suggested initially, check for files at startup, but if there are no files that are marked in the resume data as already downloaded do not reset the progress (and do not go to the checking state), just go to a certain erroneous state.

qBittorrent crashes and restarts, and the file becomes slightly corrupted in the process. If qBittorrent doesn't validate fastresumes on start, I'll end up with bad data in these situations

It seems impossible for me that existing files can be corrupted due to qBittorrent crash. Anyway qBittorrent doesn't really check all the torrents during startup.

If you forget to plug it in, all torrents are immediately FUBAR and need to be fully rechecked the next time the drive is connected (many users complain about this all the time).

It's really symptom of "losing progress" problem. We cannot just resume torrent after disk is reconnected since internally it has 0 progress and it just starts to rewrite all existing data from the scratch. So currently qBittorrent user have to restart application after disk is reconnected or recheck affected torrents.

glassez avatar Sep 23 '20 03:09 glassez

I was just asking about related aspects. Should it not check for files at all at startup and continue running as if the session didn't really break, and only show errors when they actually occur (for example, when trying to read or write a disk), as @arvidn suggested. Or, as I suggested initially, check for files at startup, but if there are no files that are marked in the resume data as already downloaded do not reset the progress (and do not go to the checking state), just go to a certain erroneous state.

Without regard to the underlying difficulties, I would choose the "check files at startup" way. I consider showing more accurate state of the torrents when qbt starts up is preferred if possible.

But even if it is really needed, it seems to me that with this flag, we can implement the rest at the qBittorrent level (for example, initiate reading of some piece that is supposed to be already downloaded, or something else).

I imagine we can check if the files are accessible and if not we show some error state.

Chocobo1 avatar Sep 23 '20 05:09 Chocobo1

one slight complication with:

"check files and stop the torrent if the resume data doesn't match the disk, but keep all the resume state"

is that currently checking the files on disk is kind of tightly coupled with also establishing a correct state for the torrent (i.e. one where there is no data corruption). For example, if the seed_mode flag is set, and there's a file missing, that flag is cleared.

To have another flag control this behavior would not only add complexity to the code but also to the documentation.

The use case seems to be: "the drive is unmounted", but what would happen in the case where some files are there and some aren't? would that still be treated as "the check failed, stop the torrent and keep the resume data"?

arvidn avatar Sep 23 '20 20:09 arvidn

I'm sure that keeping progress and declaring an error is the best solution anyway. It is unlikely that resetting progress and re-downloading files is what the caller expects when it provides resume data.

glassez avatar Sep 24 '20 09:09 glassez

one nuance is that "providing resume data" isn't quite a "yes" or "no". Since the same code path is used for adding a torrent for the first time as subsequent times.

For example, "peers" is resume data that is useful when resuming a magnet link (made relevant with the recent change to allow saving resume data before metadata is downloaded). trackers and DHT nodes are also resume data, but may also be specified the first time a torrent is added.

I imagine one could consider "have_pieces" as the main resume data. But there are other fields too, like the seed-mode flag. If the seed mode flag is set, there's also "verified_pieces", which indicate which pieces have been checked and which ones still need to be checked on-demand as they are requested.

I think there are some details that would need to be fleshed out.

Anything in add_torrent_params that imply a specific state about the files on disk could be considered resume-data (for this purpose). If any of them are found to be wrong, the torrent should go into an error state, but the resume state should remain the same.

Does that sounds right? I'll have to think about the consequences of that.

arvidn avatar Sep 24 '20 10:09 arvidn

I imagine one could consider "have_pieces" as the main resume data.

Yes.

But there are other fields too, like the seed-mode flag. If the seed mode flag is set, there's also "verified_pieces", which indicate which pieces have been checked and which ones still need to be checked on-demand as they are requested.

Is "have_pieces" used with "seed_mode"? IIRC, no. But why do you need yet another field "verified_pieces"? Why not reuse existing one to store verified pieces in seed mode?

Anything in add_torrent_params that imply a specific state about the files on disk could be considered resume-data (for this purpose). If any of them are found to be wrong, the torrent should go into an error state, but the resume state should remain the same. Does that sounds right?

It sounds right except the cases when there are inconsistencies in resume data itself (e.g "verified_pieces" is presented but no "seed_mode", etc.). In such cases "add_torrent" should fail (as I suggested above).

glassez avatar Sep 24 '20 10:09 glassez

@arvidn, what are your current conclusions, given all of the above?

glassez avatar Oct 03 '20 10:10 glassez

@glassez @arvidn On this topic: My application has a torrent cache that stores all torrent_handle states when it is deleted. When a user adds a new .torrent file to a download, the application tries to recover the torrent_handle state from the cache. This greatly speeds up the check of already downloaded data (and other data...). And if in the process of adding a torrent the download does not start (due to lack of data on disk), how can I solve this problem? I really hope you don't break this functionality.

master255 avatar Oct 20 '20 10:10 master255

I really hope you don't break this functionality.

It turns out that we will still have to regulate this behavior using the flag, if @arvidn is still worthy to make changes here.

glassez avatar Oct 20 '20 10:10 glassez

@master255 I don't know what you're referring to. It sounds like you describe normal use of resume-data.

I did land this: https://github.com/arvidn/libtorrent/pull/5173

Which effectively allows deferring some checks of the resume data until a peer requests pieces. My understanding is that this solves the problem of save directories on external drives that sometimes may be unmounted.

arvidn avatar Oct 20 '20 11:10 arvidn

I did land this: #5173 Which effectively allows deferring some checks of the resume data until a peer requests pieces. My understanding is that this solves the problem of save directories on external drives that sometimes may be unmounted.

Well, it fixes most important inconvenience. It shouldn't be too difficult to implement the rest in the app itself. Do you have a forecast for v1.2.11 release date?

glassez avatar Oct 20 '20 11:10 glassez

@arvidn

I don't know what you're referring to.

Very often I hear it from you. Whatever it is, I wrote an email. In my application I made a new system for opening and saving torrents. I have not seen such a system anywhere. Why - I do not know. Check your mail box.

master255 avatar Oct 20 '20 13:10 master255

Very often I hear it from you.

I can be more specific.

You say you have a cache of "torrent_handle state". a torrent_handle is a std::weak_ptr<torrent>. It doesn't seem likely you're referring to that. If you're referring to the state of the torrent, i.e. the resume data, as stored by add_torrent_params, that's exactly how it's meant to be used and there are several examples in the libtorrent repo, along with all prominent applications using libtorrent.

This greatly speeds up the check of already downloaded data

This further supports the reading that you're referring to resume data, as that's what resume data does.

And if in the process of adding a torrent the download does not start (due to lack of data on disk), how can I solve this problem?

I didn't interpret this as a question, since it doesn't seem on topic, and lacking any specifics, there is no (generic) solution to a torrent that cannot be downloaded. It depends entirely of circumstances not mentioned in your question.

I also got distracted by your last sentence: "I really hope you don't break this functionality." which presumably refers to resume data. Yes, I also hope I don't break resume data. It's not apparent why you bring this concern up in this thread, especially without any specifics.

If you want to elaborate on either the issue of torrents not downloading or your use of resume data, please file a new ticket.

arvidn avatar Oct 20 '20 14:10 arvidn

@arvidn

It's not apparent why you bring this concern up in this thread, especially without any specifics.

I wrote about it because the above was a speech about recovering "resume data" with an error state. Now the errors are ignored and this is good for my cache. If "resume data" is recovered with an error state, I will not be able to use it as a cache.

How my cache works: When any torrent is removed from a session, add_torrent_params is serialized into text and written to an array: (infoHash: serialized (add_torrent_params)). When quitting the program, the cache is written to disk. When the program starts, the cache array is restored. Whenever a torrent is added to a session, add_torrent_params is first taken from the cache and then the necessary parameters are set - this significantly speeds up re-downloading and continued downloading of data.

I think all torrent programs should have this cache. This will significantly speed up the work of torrents. At this moment, only my player uses this type of cache. I do not understand why.

I hope I have explained everything correctly and clearly. Arvin health to you :-)

master255 avatar Oct 20 '20 21:10 master255

Seems like you're describing resume data, which practically every client has.

vktr avatar Oct 21 '20 07:10 vktr

@vktr No. This is simply to check.

  1. Start downloading the file
  2. Delete file download without deleting data.
  3. Start downloading the file again. If the file continues to be downloaded instantly, then cache is present, and if there is a long check of data, then there is no cache.

master255 avatar Oct 21 '20 09:10 master255

@master255 Care to show some code? So far all you've done is to confirm the behavior to be similar/equal to resume data. If it's a superior way of doing things I think most client applications would be interested in seeing how it's done.

vktr avatar Oct 21 '20 09:10 vktr

If you store some "cached" data after the user deleted the torrent from the app, I don't think that's what the user expects from it.

glassez avatar Oct 21 '20 09:10 glassez

@vktr What code are you talking about? Read this message and tell me which point needs a code sample? And in what language do you need an example? I use a Java language.

@glassez

If you store some "cached" data after the user deleted the torrent from the app, I don't think that's what the user expects from it.

What is the problem with this? What difference does it make?

master255 avatar Oct 21 '20 09:10 master255

@master255

I'm happy you linked to the exact message I wanted to quote. You say;

How my cache works: When any torrent is removed from a session, add_torrent_params is serialized into text and written to an array: (infoHash: serialized (add_torrent_params)). When quitting the program, the cache is written to disk. When the program starts, the cache array is restored. Whenever a torrent is added to a session, add_torrent_params is first taken from the cache and then the necessary parameters are set - this significantly speeds up re-downloading and continued downloading of data.

I think all torrent programs should have this cache. This will significantly speed up the work of torrents. At this moment, only my player uses this type of cache. I do not understand why.

You mention "I think all torrent programs should have this cache. This will significantly speed up the work of torrents. At this moment, only my player uses this type of cache. I do not understand why.". That cache you mention, that is what I would like to see some code samples from. You can show it in any language you want.

The reason I'm asking is because I said this,

Seems like you're describing resume data, which practically every client has.

To which you responded,

@vktr No. This is simply to check.

That made me interested, since you're specifically not describing resume data, but some other not similar cache to save/restore torrents from/to the session.

vktr avatar Oct 21 '20 09:10 vktr

@master255 are you opposed to using the term "resume data" instead of "cache". It's just a historical name, since back in 2004 when bittorrent clients started doing this. It's easier to communicate with shared terminology, if you want to use a different word, please be explicit about defining your new term. It's quite obvious that what you are describing is normal use of resume data, the fact that you insist that it isn't (while failing to describe how it differs) gives the impression that you're trolling.

arvidn avatar Oct 21 '20 09:10 arvidn

@arvidn

gives the impression that you're trolling.

No. I am nobody's troll. I just wanted to tell about my ideas of using "resume data".

are you opposed to using the term "resume data" instead of "cache".

No. I do not oppose the use of these terms. By cache I meant the array of " resume data".

Read this message. I describe the work of the cache in as much detail as possible. And how to check the presence of a cache. If you are not interested, then let's ignore this, my beautiful idea.

@vktr In general, it's the same as using regular save resume data and restore, just with longer storing time.

master255 avatar Oct 21 '20 10:10 master255

Another fun way "the drive is unmounted" caused qBitTorrent and uTorrent to lose track of torrents was when I was trying to do seed tests off a ramdrive or USB flash memory stick -- the downloaded torrents were on the ramdrive or USB flash memory stick BUT they were sometimes given a different drive letter when the drive was re-added to the computer.

It was easier in qBitTorrent to remove and re-add those torrents with "don't recheck" option than changing the torrent's location and doing a manual force recheck. Doing repeated seeding tests to the same peer even required clearing peer lists (which meant removing+readding the torrent or possibly closing+restarting qBitTorrent) because the initial seed would no longer attempt to to connect to the peer even after the peer restarted from 0% complete.

Seeker2 avatar Oct 22 '20 03:10 Seeker2