rtorrent
rtorrent copied to clipboard
Support concurrent hash checking
Currently rtorrent doesn't support concurrent hash checking. When one torrent is doing hash checking, another torrent’s hash checking can't start until the completion of the previous one. This doesn’t fit for our purpose in some cases. For example, the hash checking for a torrent with big files may last a long time, meantime another torrent with some small files finishes the downloading, and starts to do hash checking, but it will be blocked by the torrent with the big files. This affects our business, so we look for a method to make rtorrent support concurrent hash checking.
It seems simple to do that based on current rtorrent codes, just comment below two lines in the function Manager::receive_hashing_changed
// if (!tryQuick && foundHashing) // continue;
We did some test, and it seems work perfectly, the torrents do hash checking concurrently after download the data. Any risks for that? Why add the limitation in current codes?
It was a limitation from before hash checking was moved to a separate thread, and there should now be room for more concurrent checking.
Was this functionality added in the meantime?
I recompiled rtorrent and tested this, with the change as suggested by @jackejiang
It works, but it actually does not appear to be faster, rtorrent uses the same amount of CPU, does each hash check not get its own thread using this method? I guess that would be more work than commenting out 2 lines?
concurrent hashing = disk trashing = not reasonable.
@pyroscope I would suggest hash checking one torrent per physical drive.
And possibly have some rules in the rtorrent.rc file regarding how checking a torrent could be paused if there are much smaller torrents waiting to be hash checked on the same drive.
@helipos suppose i have a zfs pool with a single vdev of 8 disks in a z2 raid, how can rtorrent determine which physical drive is being hammered where? and believe it or not, that's on the low scale of complexity.
@kannibalox good point, it's probably beyond the capability of rtorrent to handle those ZFS systems. At the other end of the scale is multiple drive multiple volume systems that could benefit greatly from concurrent hashing. Perhaps just have the option, but disabled by default.
Yeah I'm definitely down for this feature, I just don't think basing it on any kind of auto-detection of the system is going to be feasible.
BTW, one single item can live on several file systems (and current code acknowledges that).
It'd be interesting to see this as an option for those of us with flash storage that requires deep queue depths in order to attain decent throughput numbers.
That would indeed be great, as having the hashing speed capped at ~200Mb/s with 100+ torrents while the flash storage is capable of 1+Gb/s isn't great.
I support a more intelligent hashing feature as well. @infowolfe and @kannibalox both make good points that the bandwidth and random-access capabilities of your storage might be severely under-utilized with single-torrent validation.
Considerations for this:
- Storage sizes are growing as costs are shrinking. Folks have many terabytes or tens of terabytes of capacity. An 18TB disk is now less than USD 300.
- Files themselves are growing in size due to various factors (4K/UHD media, archival content, etc.). An individual file can be 80GB+ or a torrent can represent tens to hundreds of GB of content.
- Storage media are getting faster. Flash is cheaper, modern file systems are more accessible. ZFS or other RAID-like storage systems are available via other means such as FreeNAS or Unraid.
- Networks are getting faster. This affects network file systems where content is stored over the network on a NAS but the software accessing it (such as rtorrent) is on another machine. A used 10Gbps Ethernet card is under USD 200.
- Companies exist that sell "seed boxes" so folks have easy access to all the above.
I would suggest revisiting this old assumption about disk thrashing or slow hunks of iron with heads seeking. 10 HDDs in 5 mirrored pairs with ZFS can deliver over 1GB (gigabyte, not gigabit) per second sequentially, and 200+MB/s randomly.
I would possible argue that even single-torrent validation could be multithreaded. Given, for example, a large 100GB+ file on NVMe storage and multiple CPUs, you would need to divide up the torrent pieces among threads in order to fully match the bandwidth of your storage.
Transmission recently supported skipping the full verification when a torrent is added (given certain conditions). See here. I think this is a valuable feature, running incremental validation on pieces as they are accessed, as this would enable immediate seeding, and distribute the validation across torrents more naturally.
Perhaps one way to work around the problem OP raised would be to allow some sort of configuration of the manner in which rTorrent actually performs the checking procedure? rTorrent crashes on me sometimes (just earlier today, actually) and the only resolution is removing the *.libtorrent_resume
files and restarting. It then of course takes ages to hash and check all the torrents. It would be amazing if there was a way to have the hashing process go in order of smallest to largest, which would leave the very few very large torrents for the very end and get 90+% of the work done much more quickly.
Just a thought.