BiglyBT icon indicating copy to clipboard operation
BiglyBT copied to clipboard

"search for existing data files" incorporates pieces or just whole files?

Open as-muncher opened this issue 3 years ago • 29 comments

Java 1.8.0_202 (64 bit) Oracle Corporation c:\program files\biglybt\jre

SWT v4942r22, win32, zoom=100, dpi=120 Windows 10 v10.0, amd64 (64 bit) B2.7.0.2/4 az3

So, my request is that "search for existing data files", would incorporate by pieces, not just copy the whole incomplete file if it finds an exact match. I am also asking that that feature would ignore the file extension, so that if I am looking for files downloaded with qbittorrent, that BiglyBT would ignore the .!qB extension, which saves me an extra step of having to rename the files. If BiglyBT finds a match, then I would hope that it would copy each individual piece that matches, and then after that search is done, that if I tell it to search in a different location, and it also finds a file that matches that filename except maybe it has a different extension, that it would see if the data matches, and incorporate as many pieces of that file also as it can.

as-muncher avatar Jun 28 '21 22:06 as-muncher

The extension isn't used - it matches based on file size and content.

parg avatar Jun 29 '21 07:06 parg

Well, phase one of matching does - you're right, for small files that can't be piece-matched it drops back to name based matching.

parg avatar Jun 29 '21 07:06 parg

next beta will have a match mode of 'piece' which will copy pieces into the original files when found to be correct

parg avatar Jun 29 '21 14:06 parg

Hey, was checking out the new mode, two little things:

  1. The new field makes the contents too wide for the dialog, and it's not resizable: image

  2. The 'Mode' fields...

    1. The second combo box only activates for "Link" mode, not any of the other three — so maybe those two choices would be more convenient as two distinct items in the first field?
    2. The two Link types are "Hard" and "Internal"... guh? What's an "internal link"? I know what a hardlink is, I'd have expected the other type to be softlinks, at least on Linux... Or does "Internal link" imply setting the Path for that file to the discovered location, in the torrent? (If so, I'd suggest something like "Hardlink" (for linking), vs. "Reference" or "Store path" or similar, for internal "linking".
  3. (Also, followup, why aren't softlinks an option? And what happens if you select Link/Hard and the files are on a different device from the torrent output directory?)

ferdnyc avatar Jun 29 '21 19:06 ferdnyc

(One option, for the dialog, would be to get the "Use default locations" checkbox out from that space on the left side of the Mode selection, since it's not really related in any way.)

ferdnyc avatar Jun 29 '21 19:06 ferdnyc

Oops, one more little thing, re: Piece mode. After a run over a large set of files, matching quite a few...

.
Found 468 files with 399 distinct sizes
Processing '$torrent', piece size=4.19 MB
    $dir/$file1 (size=92.04 MB) - 2 candidate(s)
        Testing $path_to_candidate1a -             Setting storage type to linear - OK
            Setting priority to normal - OK
..... Copied 21 pieces
        Testing $path_to_candidate1b - ..... Copied 21 pieces
    $dir/$file2 (size=78.12 MB) - 2 candidate(s)
        Testing $path_to_candidate2a -             Setting storage type to linear - OK
            Setting priority to normal - OK
..... Copied 18 pieces
        Testing $path_to_candidate2b - ...... Copied 18 pieces
    $dir/$file3 (size=52.03 MB) - 1 candidate(s)
        Testing $path_to_candidate3 -             Setting storage type to linear - OK
            Setting priority to normal - OK
... Copied 11 pieces
[...]
    $dir/$file22 (size=35.52 MB) - 2 candidate(s)
        Testing $path_to_candidate22a -             Setting storage type to linear - OK
            Setting priority to normal - OK
.. Copied 8 pieces
        Testing $path_to_candidate22b - . Copied 8 pieces
    Matched=22, complete=0, ignored as not selected for download=0, no candidates=21, remaining=22 (total=3503)
    Looking for other potential name-based matches
    Root folder is $path_root
    Copied 0 of 22
    Changed 22 file priorities from 'skipped' to 'normal'
6/29/21, 3:55 PM: Complete, downloads updated=1

...So, after all that updating, it reports "Copied 0 of 22". Which is a bit weird. Minor, of course, but... weird.

ferdnyc avatar Jun 29 '21 20:06 ferdnyc

Shit. One more thing. (Sorry!)

This is a directory of files, all exact matches, discovered with piece mode:

image

As you can see, even though the files are contiguous in the torrent, the pieces they straddle were all lost. Using Copy mode, the same files end up like this:

image

(I guess the statement could be "Piece mode is only for incomplete files", but... that's a little meh, isn't it? Seems like it should still DTRT with complete files, even if it's in Piece mode.)

ferdnyc avatar Jun 29 '21 20:06 ferdnyc

Yes, that's right - in 'piece' mode it only copies matching pieces - if you don't want that behaviour then use 'copy'.

parg avatar Jun 29 '21 21:06 parg

The 'link' stuff has been there since the last release, maybe more... Soft links are dangerous due to the fact that they're not reference counted and can lead to unwanted deletion - the user that was pushing for support for hard links was keen that they weren't supported...

parg avatar Jun 29 '21 21:06 parg

I just hope that the dialogues are straightforward and easy-to-understand. I don't want to have to take the extra step of looking up what the modes are on the biglybt wiki, if there is one. Thank you so much for adding this feature, the piece matching. This searching for existing data files was the main reason I switched to using BiglyBT, because I thought that it already would incorporate by pieces, so now I'm excited that it will now do that.

Personally, I would want BiglyBT to incorporate as fully as possible, so that if I chose "by piece", that it would prompt me or something for those missing pieces at the beginning and end of the files, as mentioned by @ferdnyc . Perhaps there could be an option to choose "by piece, then by copy", if I'm saying it right. It would make sense that if I have the whole file already downloaded, that BiglyBT would check it that it's all there and reflect that. It saves having then to do another step of "by copy". I think it would be wonderful if BiglyBT could prompt to fill in the missing beginning and end pieces, or just do that automatically.

as-muncher avatar Jun 29 '21 23:06 as-muncher

@parg

The 'link' stuff has been there since the last release, maybe more...

That sounds about right, for when I last looked at SFEDF. I go through fits and spurts where I step away from a thing for a while, but I get back around to it eventually.

Soft links are dangerous due to the fact that they're not reference counted and can lead to unwanted deletion - the user that was pushing for support for hard links was keen that they weren't supported...

That's... fair enough, I suppose?

I mean, there can always be unwanted deletion of a file's contents, if you're modifying the target of a link — but that's equally true for hard links. I've definitely accidentally wiped out the contents of my existing files in the past, by setting a torrent to save "over" them... but symlinks are not appreciably more dangerous than any other kind, there. (In fact, less! Deleting a symlink will never wipe out the target file, unlike deleting the last hard link to it.)

So that's why I'm a bit confused... why would soft links need to be reference counted? They're non-owning references, effectively.. the original target file always exists on disk, whether there are 0, 1, or 100 soft links pointing to it. Unlike with hard links, the "last softlink" isn't anything special. The number of them is immaterial. Unwanted deletion of the file itself should be impossible, unless some code is chasing down link targets and messing with them like it's not supposed to.

Sure, it's a problem if two torrents are pointing at the same destination file, but that can be detected by resolving both links to their target path on disk.

A bigger issue is that soft links can be relative, and when they are they can't be moved without breaking the link. That's a real concern when it comes to doing things like completion-moving. OTOH, they even work across filesystems, something hardlinks can't offer. And BiglyBT could certainly just not create relative symlinks? :laughing:

...Aaaaanyway, I'm fine with not having softlinks, really. But I'm still in this situation where I really have no idea what "Link"/"Internal" actually means? If my guess earlier was right, then I think there has to be a better name for whatever-kind-of-path-reference is being made than an "internal link". Those words put together like that mean nothing.

(It would be a slightly less terrible way to describe a relative symlink, actually, but we know it's not that.)

But I'm still kind of leaning towards just one list of modes, with five options, maybe:

  • Set location
  • Hard link
  • Move
  • Copy
  • Piece copy

It still needs a little polish. But It's better than...

  • Link
    • Internal
    • Hard
  • Move
  • Copy
  • Piece

...Especially since, if I've understood this so far, Link/hard and Link/internal are totally unrelated, really. One is a filesystem operation (which may or may not be possible, depending on the locations involved), whereas the other is a torrent metadata update that doesn't touch any disk paths at all.

ferdnyc avatar Jun 30 '21 17:06 ferdnyc

Sure, it's a problem if two torrents are pointing at the same destination file, but that can be detected by resolving both links to their target path on disk.

A bigger issue is that soft links can be relative, and when they are they can't be moved without breaking the link. That's a real concern when it comes to doing things like completion-moving. OTOH, they even work across filesystems, something hardlinks can't offer. And BiglyBT could certainly just not create relative symlinks?

Oh!!! Yes, and of course the user might delete the target file, which also creates a broken link... is that what the concern was? I've heard of people doing boneheaded things with Dropbox folders, too, where they made symlinks to files stored elsewhere, then deleted the originals and were surprised when Dropbox held only broken links... But the kind of people who blunder into those situations tend to find hardlinks even more incomprehensible, in my experience. (They'll do things like edit the file, and then be surprised when "both copies" are changed!)

ferdnyc avatar Jun 30 '21 17:06 ferdnyc

@ferdnyc You bring up something interesting about linking. I know that I will download a torrent, for example called "ABC" that is 10 GB, and then when I do a search on btdig.com for example, and find, oh, now someone has added content to it and still called it "ABC" but now it's 30 GB. So, then let's say I now add that ABC 30 GB torrent, and so now what happens? Will BiglyBT continue to download from the seeders and peers from both torrents, even though they go to the same file(s)? Hopefully yes! But then what happens when I choose to delete the first torrent which is only 10 GB? Will my files still exist for the 30 GB torrent? Hopefully also yes. But then also sometimes people will still name it ABC, but it will have some files different. It would be nice for sure if there were some sort of dialogue brought up by BiglyBT so that the user can see which files are the same, which are different, when adding a torrent with the same file name. Sometimes, torrent ABC 30 GB will have slightly different data for the very same file with the exact name common to that torrent and to the ABC 10 GB torrent, so that the user then has to figure out maybe which one has faulty data, or which one is at a lower quality, or what the case may be. Personally, I hate when torrent files are saved with their extension as if they were fully complete instead of having some sort of .!az extension of .!qB etc. So maybe someone then chooses to create a new torrent with more files in it, and yet it hasn't even yet been fully downloaded to completion, you know?

as-muncher avatar Jun 30 '21 20:06 as-muncher

@as-muncher

Ohhh-kay, wow, there's a lot there. Well, without turning this into a deep dive on decentralized, distributed peer-to-peer file sharing and replication, a few notes...

@ferdnyc You bring up something interesting about linking. I know that I will download a torrent, for example called "ABC" that is 10 GB, and then when I do a search on btdig.com for example, and find, oh, now someone has added content to it and still called it "ABC" but now it's 30 GB.

Well, first off, the name of the torrent means nothing. It's simply a label, and there is nothing special about two torrents that happen to have the same label. In fact, that label is so meaningless, you're free to change it even while you're in the process of seeding, and it affects nothing. (There's both Rename Displayed Name and Advanced ⯈ Rename, in the context menu.)

So, if someone uploads a different torrent with the same name, automatically exactly nothing happens, to anyone. It's a different torrent. It has no relationship to the other, previous version(s). Generally, though, there are a few scenarios, with different options available to you:

  1. The new torrent is an alternative version of the previous, smaller torrent. Perhaps it's higher-quality, or it's less compressed. Generally, then, the two torrents will have no blocks in common — you'll continue to seed the smaller torrent, you have to separately download and seed the larger if you want to, and common file blocks (pieces) between the two torrents are unlikely, so they'll just both take up space, separately, on your disk. There's no real way around that.

  2. Or, the new torrent is a superset of the previous torrent, perhaps a collection that includes its content as well as other, related torrents. This is the precise situation SFEDF was created to address. Very frequently, torrents will have contents in common with other torrents, files which we may already possess some or all parts of — perhaps under different filenames, or organized in a different way, but still the same files.

    The user still has to manually run SFEDF to match up those pieces. (It's not a safe action to run automatically, which is why the torrent has to be stopped before it can even be done at the user's explicit request.) But if you choose to join a swarm for a collected torrent, and you know you already have some of the contents downloaded from a previous torrent, before starting it (or after stopping it) you can use SFEDF to avoid redundantly downloading those same files again.

So, then let's say I now add that ABC 30 GB torrent, and so now what happens? Will BiglyBT continue to download from the seeders and peers from both torrents, even though they go to the same file(s)? Hopefully yes!

You should absolutely never, ever have two torrents downloading to the same files at the same time. They will get in each other's way and likely cause lots of extra work re-downloading pieces that they clashed over and corrupted. It's fine to seed the same files to two different torrents at once, but only when they're already completed.

Typically, what I do in those situations is... well, it depends what the status of the smaller torrent is. If it's completed, and still worth seeding, I'll typically leave it running. If it's partial, and I want to "migrate' to the larger, more complete torrent, I'll stop it and plan to delete it after I've started the larger one.

But either way, I'll load the larger one in stopped state, then use SFEDF first thing (before ever starting it) to pre-complete the files I already have, right from the first torrent's download directory. In fact, if it's a mega-collection that I don't plan to completely download in its entirety, I'll often start with those existing files as the only ones I have enabled, for the larger torrent:

  1. First I disable all of the contents in the new torrent's Files tab (which in tree view is just clicking a single checkbox at the root of the tree)
  2. Next I run SFEDF one or more times, scanning some or all of the subtrees, to match up and enable any files I already have. (I basically single-handedly browbeat @parg into adding the "Search for skipped files and enable..." checkbox to the SFEDF dialog, without which this workflow would be impossible. Or, well, it was very tedious in the past, to say the least.)
  3. Then, after I've matched and enabled only the files I have, I start the torrent and let it get to seeding. Typically it also has to do some partial downloads, for edge pieces that I don't have in their entirety. So it'll download the last few megs of data, then switch to seed mode
  4. While it's doing that, I can re-enable any of the files I don't already have, but want, and it'll add those pieces to the request list.

At that point, most of the time I'll delete the smaller torrent, because it makes more sense to focus on the collected one and encourage people to use that swarm instead. Experience tells me it's far more likely to stay well-seeded over the long term, compared to the smaller one. Aggregate swarms are healthier swarms.

But then what happens when I choose to delete the first torrent which is only 10 GB? Will my files still exist for the 30 GB torrent? Hopefully also yes.

That depends entirely on you, and how you choose to manage your files. My personal configuration, which leans towards safety (because disk is cheap), is:

  1. I don't use download completion moving (which transfers files out of the torrent download directory to some other place on disk, when they're finished downloading), but that's one way to protect your existing files. Especially when combined with...
  2. In Options ⯈ Files ⯈ Completion Moving, there are checkboxes for:
    • "Move completed files (when being removed)" (I have it pointed to my trash directory)
    • "only if in default data dir" (which is a suboption of the previous) I have both of those enabled, so that any time I use SFEDF to match up files outside of the torrent download directory, they are guaranteed not to be destroyed when I delete the torrent. That helps protect my non-torrent-scratch-dir files, even when I choose to seed them in a new torrent.
  3. There's also a checkbox there for 'Move even if some files are flagged "Do Not Download"' — I keep that enabled, but if you're in the habit of mixing-and-matching subsets of files in the download directory between multiple torrents at once, then unchecking it can offer some protection against accidentally removing files that an active torrent is still using.

How you use SFEDF will also affect things.

  • If you use "Copy" (or "Piece") mode to duplicate the data blocks into the new torrent's download dir, then there is no relationship between the two directories and either can be deleted independent of the other. (But common blocks take up double disk space.)
  • If you use "Move" mode, then the old torrent no longer has those files in its download directory (or, they are now empty), and if you were to start it they would be re-downloaded.
  • If you use either of the "Link" options, then the files are shared between both torrents. In the "Link/Hard" case, the same file exists in both directories, but it's safe to delete either one without affecting the other. (You just have to be careful not to delete both/all links, because removing the last link to those blocks on disk equals deleting the file.) In the "Link/Internal" case, the only copy of the file(s) in question is at the first download location (the second torrent points directly there, for those files), so deleting the files for the first torrent will cause the second torrent to lose access to them, and it will turn red and unhappy in the Library view.

Personally I tend to use "Copy" mode for files in my torrent download space (because the bit of extra disk is cheap, and I don't have to worry about torrent interrelationships), and for files outside of the torrent download space I use (what I guess is now called) "Link/Internal", since like I said BiglyBT is configured not to mess with those files when it deletes things, for me.

But then also sometimes people will still name it ABC, but it will have some files different. It would be nice for sure if there were some sort of dialogue brought up by BiglyBT so that the user can see which files are the same, which are different, when adding a torrent with the same file name.

Sometimes, torrent ABC 30 GB will have slightly different data for the very same file with the exact name common to that torrent and to the ABC 10 GB torrent, so that the user then has to figure out maybe which one has faulty data, or which one is at a lower quality, or what the case may be.

The only way to know which files are the same is to use SFEDF, because like I said the names mean nothing. To determine that two torrents have common files, BiglyBT needs to scan each block of each file and verify that it exactly matches the torrent manifest — BitTorrent doesn't work with "files", it manages "pieces". Those pieces can be arranged into the shape of files, when they're saved to disk, but that's almost a side-effect... much of what a BitTorrent client does completely ignores the fact that its disk blocks happen to represent data files, in the torrent space they're just collections of data blocks.

Plenty of files that have the same name, are completely different and have zero blocks in common.

Other files will have been renamed, moved to a different directory, and rearranged in 17 different ways, but they're still the same file. The former situation is useless to BiglyBT, there's no way it can make use of those non-matching pieces. The second, it's able to detect despite all of the disk layout changes... but it requires a SFEDF scan to match the common pieces up.

Personally, I hate when torrent files are saved with their extension as if they were fully complete instead of having some sort of .!az extension of .!qB etc. So maybe someone then chooses to create a new torrent with more files in it, and yet it hasn't even yet been fully downloaded to completion, you know?

Well, like I said in general aggregation is a good thing. Larger torrents lead to larger swarms. Larger swarms retain more seeders. More seeders translates into healthier and longer-seeded torrents. And with tools like SFEDF, it's a fairly smooth process to migrate from smaller torrents to larger, aggregated ones, without having to duplicate a lot of download or disk bytes.

Oh, and on the file naming, check out Options ⯈ Files ⯈File Extensions (you have to scroll way down to get to it), there are all sorts of options for incomplete-file naming. I have BiglyBT set to add a .part suffix to all incomplete files, so that when it gets removed I know they're complete.

ferdnyc avatar Jul 01 '21 00:07 ferdnyc

Ohhh-kay, wow, there's a lot there. Well, without turning this into a deep dive on decentralized, distributed peer-to-peer file sharing and replication, a few notes...

And by "a few notes" I apparently meant 2000 words. (Including quoted sections.) Oops. :angel:

ferdnyc avatar Jul 01 '21 00:07 ferdnyc

You should absolutely never, ever have two torrents downloading to the same files at the same time. They will get in each other's way and likely cause lots of extra work re-downloading pieces that they clashed over and corrupted. It's fine to seed the same files to two different torrents at once, but only when they're already completed.

For the whys and wherefores on how that works and why it needs to be that way, it's important to understand that your torrent client doesn't have control over what pieces of a file it downloads. It can merely send requests for the tracker to disseminate, basically a wish list of file blocks it needs. It is at the mercy of both the tracker and the other peers in the swarm to decide which of those requested pieces they send your way, which they will do according to their optimization algorithms, taking into account the needs of all peers and of the swarm as a whole. (Pieces that have the least duplication across the swarm typically get first priority, for starters.)

So, if a torrent client were trying to "straddle" two swarms and request the same missing pieces from both of them, it would inevitably create lots of extra, wasted work for the other peers, when they send it pieces that it's already receiving from the other swarm. And if your client tried to "intelligently" manage the two swarms, requesting one set of pieces from one, and a different set of pieces from the other, that would only serve to degrade each swarm's ability to optimize the piece distribution among its peers.

Besides, a single well-seeded swarm can deliver gigs of data, sometimes tens of gigs, per minute — for anyone who's not sitting inside an equipment rack at one of the global internet backbone peering exchanges, the limiting factor is far more often our maximum download rate, not the swarm's availability. There's really no advantage to gaming multiple swarms seeding the same files; just choose the one with more seeders, and drop the other one. The torrent will be nearly complete by the time you 're done with that.

ferdnyc avatar Jul 01 '21 01:07 ferdnyc

For the whys and wherefores on how that works and why it needs to be that way, it's important to understand that your torrent client doesn't have control over what pieces of a file it downloads.

I should have said, there, that "your torrent client isn't normally in complete control" of what it downloads. And it's usually better that way.

Technically, though, there are some options for exercising more control over the download process. In BiglyBT, there's at least:

  • Every file has a "Set Priority" submenu, in its context menu in the Files tab. Setting a file to High priority will request that its pieces be sent before others, if possible. That can be a useful thing, especially when downloading a large collection where you know completing the whole thing is going to take hours.

    Sometimes you want to check out certain files ASAP, without waiting for all of the others to complete. When pieces all have the same priority, normally even in a massive torrent they all progress to completion fairly evenly. The default distribution of pieces just doesn't tend to favor any particular grouping of them, on average. But swarms will usually respect High priority, and setting it for a limited subset of files does get them to 100% much more quickly. The peers will still continue to send other pieces as well, though, so it doesn't harm the distribution much.

    (I imagine there are confused people who set EVERY file to High priority, thinking that will give them priority over the other peers in the swarm and speed up their downloads. That could not possibly be more wrong, and every file having High priority is exactly the same as every file having Normal priority. Priority is only useful for pushing some files to the front of your queue.)

  • Then there's the "Sequential Download From File" checkbox, also in the context menu for each file. I have never personally activated that option, or even really been tempted to. I've just never felt the need. I assume it works by narrowing the piece request list sent to the tracker. Possibly even requesting just one piece at a time, in order. Presumably that works, after a fashion, or the option wouldn't be there. But I can also guarantee it makes the download much slower than it would be if the swarm were allowed to optimize distribution the way it's meant to.

ferdnyc avatar Jul 02 '21 11:07 ferdnyc

@ferdnyc That's interesting about just letting BiglyBT do its thing and download whichever pieces it wants to when it wants to. I tend to set the last file sorted by name to highest priority, next last file to one step lower priority, and so forth, so I wish there were a "sort by decreasing priority" option or reverse of what's available. I find that's important, especially when the one seed shows up online and I want to try and complete some of my files as soon as possible while that lone seeder is on. At least, then, I'll have files that the other peers won't have, and then I end up being the one to send those complete files to them. I wish that the torrent client would automatically recognise when a seed is on that hasn't been on for a super long time, and give priority to that file to be downloaded, because it has been sitting for so long unfinished. @parg I hope you're reading through this issue. :) And I wish that a lot of this stuff, figuring out how to manage the downloaded files was a little simpler, so I don't have to think as much and figure it out as much.

as-muncher avatar Jul 02 '21 22:07 as-muncher

@ferdnyc That's interesting about just letting BiglyBT do its thing and download whichever pieces it wants to when it wants to.

At the risk of being overly precise, that isn't really what I said, not exactly. Like I said, BiglyBT isn't really in control of that stuff, in the typical case -- and it's better that way. My advice is, let BiglyNT let the swarm send it whatever pieces it wants, in whatever order they want. The tracker has the most complete picture of the entire swarm, and can make the most efficient decisions -- and it will, if the clients let it. Clients assigning a lot of priorities and ordering just make its job harder. To a limited extent, that can be absorbed, so things like prioritizing certain files is fine. But the more clients meddle, the more they interfere with swarm efficiency.

I tend to set the last file sorted by name to highest priority, next last file to one step lower priority, and so forth,

...Why? I'm honestly curious what purpose that would have, or even be perceived to have. Why would the alphabetically last files be more important for you to download sooner?

so I wish there were a "sort by decreasing priority" option or reverse of what's available.

I'd have to check and I'm not at my computer right now, but I'd be surprised if there isn't a priority column you can add to the files tab, and if it's a column you can sort by it.

I know there's also some sort of "decreasing priority" or something option in the "Set Priority" context submenu, which I have to assume is meant for use on multiple file selections in order to do exactly the thing you're talking about. Believe me, at this point, if you can think of a feature it's most likely already in there somewhere. 😆

(The other side of that coin is, of course, that all those features create complexity. But there's no reason anyone needs to use or even understand every feature, plenty of them are for very specific circumstances that won't apply to most users and can be ignored.)

ferdnyc avatar Jul 02 '21 23:07 ferdnyc

@ferdnyc I download last file sorted by name first, just because some people have selected the option to download files in sequential order. Sure, they'll download those files, but I want to make sure that the most rare file is downloaded first. I just don't like waiting and waiting, with files taking up space on my hard drive waiting for a seeder.

as-muncher avatar Jul 03 '21 00:07 as-muncher

So, just because we're talking about piece mode in this issue, I would like to mention something.

Java 1.8.0_202 (64 bit) Oracle Corporation c:\program files\biglybt\jre

SWT v4942r22, win32, zoom=100, dpi=120 Windows 10 v10.0, amd64 (64 bit) B2.8.0.0/4 az3

So, when I have my torrent paused, and then select only certain incomplete files in my torrent, leaving the 100% complete ones unselected, when I do a piece mode search for existing data files, when the process is complete, then I notice that some of my 100% complete files are now down to 99.4 or 99.7%. I hadn't selected those files, and they got corrupted. Perhaps if a piece is found to be missing, don't delete it out of the complete file? Sure, you can say the piece is missing, but it was only maybe 50% missing. The other 50% was in the file that was 100% completely downloaded. Maybe save that data from the piece mode copy and then when recheck is done, have it in store in case it's needed? Or maybe somehow mark those pieces spanning more than one file as special? I don't know. I just don't want my fully-downloaded files getting corrupted, especially since I hadn't selected those files. Maybe if piece mode copy is done, only do it for the specific files that were selected, ignoring the files that are unselected, even though there is a piece that maybe comes up missing that spans more than one file. Thanks for looking at this.

as-muncher avatar Aug 13 '21 20:08 as-muncher

I'm not seeing that - what is logged to the window when this happens? Are some pieces copied?

parg avatar Aug 20 '21 10:08 parg

Yeah, I don't seem to be able to make it happen, either. I really thought I would be able to, but piece mode seems too smart for me. Here's how I tested:

  1. Pick a completed torrent with multiple files, each multiple pieces long.
  2. Back up all of the files elsewhere, for safety
  3. Select one of the files in the torrent, victim.zip, for testing. Ensure that it has pieces on both sides (isn't the first or last file).
  4. Make a partial copy of victim.zip to /var/tmp/, zeroing a portion at the beginning that's at least one piece size long. Since the piece size for this torrent is 1.04 MB, I'll skip 2200 512-byte blocks. (2200 * 512 = 1126400 = 1.07 MB)
    dd if=/path/to/victim.zip of=/var/tmp/victim.zip skip=2200 seek=2200
    
  5. Stop the torrent, and change the priority for victim.zip to "Do not download". BiglyBT deletes everything except the first and last partial-pieces, preserving the "adjoining files have 100% completion" criterion.
  6. Right-click victim.zip, select SFEDF, configure for Piece mode, enable "Search for skipped...", and point it at /var/tmp/

Result? BiglyBT copies 46 of the 47 missing pieces, marks the second piece as incomplete (because I partially zeroed it), and sets the file to 97.9% completion. But the already-existing partial first piece data is retained and incorporated, and therefore the preceding file is untouched. Ditto the partial piece at the other end of the file, and the next completed file in the torrent.

I wouldn't have been surprised at all if the preceding and/or next file(s) had lost their 100% completion, because the file containing the other halves of those pieces got rewritten. But because it already contained the data in question (which, if the 100% for the file on either end is genuine, it would be expected to) there was no change to the completion state of any other file(s).

ferdnyc avatar Aug 20 '21 11:08 ferdnyc

I'll see if I can write down my method, then, and try it again. Perhaps then I can make a screenshot, and blackout the filenames.

as-muncher avatar Aug 21 '21 23:08 as-muncher

Java 1.8.0_202 (64 bit) Oracle Corporation c:\program files\biglybt\jre

SWT v4942r22, win32, zoom=100, dpi=120 Windows 10 v10.0, amd64 (64 bit) B2.8.0.0/4 az3

When I move content to the folder with the same torrent name, make that folder, then open the torrent in biglybt, it says "allocating" and then it says "downloading" without seeing if there is content there already. Content -> check files exist doesn't seem to do anything. There's no "force recheck". How do I make biglybt actually sense that there is data there? I see files "iqx5y3swvttx_<video file name.mp4.!qB" Just saw the reverse arrows yellow icon - perhaps make that a magnifying glass or something? Biglybt shouldn't take this long to recheck, since there are only the screenshots, one incomplete video file, and one complete video file. Couldn't biglybt detect that there are only a few files? Biglybt couldn't even detect my files in the folder, and didn't create any of these iqx5 etc files either with zero values in there. Instead, it created a subfolder inside the folder called -1 with all those iqx5y files. I had to use content -> move data files to move it to the folder, and then delete the -1 folder. And then content -> move data files didn't work. I don't even know, then, how to point it to the right folder. biglybt is checking through supposedly 44 GiB of non-existent data, when really there is only about 2.5 GB of data there. I guess I'll have to delete the folder, then open the .torrent file in BiglyBT, let it create the directory and the empty iqx5y3swvttx files and then replace a video file with the complete one and one of the incomplete ones with the one that is also incomplete. I'll get back to you. Also, even though this may not be the same issue, could you please put your date formats as yyyy/mm/dd ? BiglyBT says that I had added this torrent already previously, but gives me a xx/yy/zz date format. I suppose, being in the U.S., that means mm/dd/yy, and here in Canada, it's dd/mm/yy, but if you would please just make it a yyyy/mm/dd format, you eliminate a lot of uncertainty. Thanks.

as-muncher avatar Aug 24 '21 01:08 as-muncher

If you select the option to add a 'unique prefix' to file names then it you would need to

  1. add the torrent file to BiglyBT and make sure the files are allocated
  2. copy/move the existing content files to the correctly named file locations (with the prefix)
  3. force recheck

parg avatar Aug 24 '21 08:08 parg

I didn't think I told BiglyBT to add a unique prefix. Maybe it's doing that because I've added an additional extension, the (blah) .!qB extension. I think what I did before was that I was downloading a torrent and some of the video files in that torrent were at 100%, and thus did not have the .!qB extension but just the .mp4 extension. What I did, then, is that I had downloaded some of those less-than 100% files using another session, and they were on an external disk. I selected only those <100% files from the first session, did a "search for existing data files" from the second session, and didn't select any of the 100% done files. But then what happened is BiglyBT reduced some of those 100% files down to 99.5% or so, maybe because it figured it didn't have that one piece at the beginning or end, like 1/2 was in the 100% done file, and the other 1/2 was in the unfinished other adjacent video file. I am not sure right now if BiglyBT ended up corrupting my 100% done files or if just the percentages were lowered and the file was renamed with the additional extension. I'd have to check it out again. I'll make sure to disable the prefix thingy.

as-muncher avatar Aug 25 '21 00:08 as-muncher

A couple of things: could you add a "copy then piece" mode to search for existing data files? What happens is I usually do a copy mode first, but when it's done, BiglyBT checks the whole torrent, but then I want to search for pieces. I guess my hope was that if I do a copy mode search, that when BiglyBT finds whole files, that there would not be pieces missing at the end, but it's looking like it's the case right now. And then I have to wait for the whole torrent to finish checking before then doing a piece mode search, and then BiglyBT checks the whole torrent again. Do I have to wait for the torrent to finish checking the first time? Or just click cancel and then do a piece mode search? Unfortunately, even with piece mode search, BiglyBT is still missing a couple of chunks of complete files.

But then I also wanted to say that I have two torrents downloading, and their status is "Downloading + swarm merge". I'm not sure if I got to that status because I did a "link, internal" mode search for each one, and directed to look at the download folder for both of these torrents that have some files that are identical in each torrent. When one torrent downloaded a file, I saw that the other torrent also marked that as 100% complete too. Thanks so much for this feature!! I appreciate it. And I love when I have a torrent that is just sitting there incomplete, and BiglyBT finds a match in the swarm. That helps a lot.

Oh, and another question: Could you please automatically link identical files in different torrents? As described above. If I'm downloading a torrent, and a file there is identical to another file in another torrent, it would make sense to copy it to the second torrent automatically. Then I don't have to always to a search for existing data files.

as-muncher avatar Oct 29 '21 00:10 as-muncher

One improvement to think about: sometimes one torrent will have a different piece size than another torrent, both torrents identical, except for piece size. One could have downloaded at 83.2% and the other at 45.8%, but because the piece size is different, they can't share the same data. It would be neat if completed pieces in the smaller piece size torrent could be put into the larger piece size torrent, and vice versa, milking the most of the common downloaded data.

as-muncher avatar Jun 29 '22 19:06 as-muncher