OpenHashTab icon indicating copy to clipboard operation
OpenHashTab copied to clipboard

Issue hashing big network folders

Open oveand opened this issue 2 years ago • 5 comments

Trying to hash rather big folders, for example 175 GB and 10.000 files located on a Hitachi NAS network share, shows some challenges.

First of all the File Explorer windows freezes for a rather long time before OpenHashTab shows. This is probably due to some file indexing going on before the GUI is actually shown.

Second, and more important, hashing actually fails with an error "Not enough server storage available to process this command". Some files are correctly hashed while others are not. This is probably not a direct OpenHashTab issue but perhaps related to how OpenHashTab accesses network shares.

image

Do you know any way of overcoming this?

oveand avatar May 08 '22 11:05 oveand

Hm... my first guess is that something doesn't let us open/lock that many files (as OpenHashTab tries to open all files before attempting to read anything from any of them). This doesn't cause immediate resource problems on local files because windows allows up to 16 million handles per process, but I imagine SMB has some lower limits on this. The best solution would be just hashing things on the server really. The other alternative is raising open file limits, but with that many files you'll probably hit samba's limit first, then linux's ulimits, then maybe your local client's ones.

namazso avatar May 08 '22 11:05 namazso

I agree this is most likely a problem with the number of open file handles. Unfortunately the files is stored on a NAS device and files are only accessible through SMB. I've been trying to do smaller bulks of hashing bit it quickly gets complicated as some folders has many small files and the folder nesting is a bit complicated :(

But I see this is a design decision that all files are opened before actually working on them. This is probably also the reason for why OpenHashTab can take up to 5 minutes before GUI showing when working with SMB shares (link is 1 Gbit but latency is probably significant).

oveand avatar May 08 '22 11:05 oveand

I have some plans on rewriting / refactoring a major part of the code for 4.0, the API changes of AlgorithmsDll was actually a first step to it (by making the api interop-friendly for use with other languages like Rust or C#, in case I decide to rewrite in that)

But I see this is a design decision that all files are opened before actually working on them.

that is correct, the handles are used for figuring out canonical paths and similar. I plan to rewrite most of the path handling with the undocumented NT api as it's much easier from a security perspective (there's already quite a bit of conversion mess going on with DOS shortpaths, DOS paths, and NT paths), will see if I can get rid of opening files early. However it'd still need at least 512 concurrent files being opened as that's how many we're queueing up for Windows to read.

This is probably also the reason for why OpenHashTab can take up to 5 minutes before GUI showing when working with SMB shares (link is 1 Gbit but latency is probably significant).

Partially. Folders are traversed too which takes some time, and all this code is synchronous / blocking. I might try rewriting it in an async / multithreaded way in the future, but that needs lots of coordination. Since in a future version I want to allow adding files after the initial bunch (maybe even mid-process too) this part of the code will need quite some overhaul anyways.

namazso avatar May 08 '22 12:05 namazso

Experiments shows that approximately 16,000 files can be opened on the Hitachi NAS storing the our files and 512 concurrent files being opened should not be an issue for most relative modern systems. The 16.000 though seems to be a global limit and running parallel OpenHashTab reduces the number of files which can be handled.

I understand why handling this challenge is a radical change to the current file handling design and I'm grateful you taking this challenge into account for a v4.0

oveand avatar May 08 '22 16:05 oveand