RHash icon indicating copy to clipboard operation
RHash copied to clipboard

Resumable hashing

Open DannyZB opened this issue 6 years ago • 4 comments

Have any of you considered resumable hashing for rhash?

When hashing extremely large files, 20GB and up, being able to resume hashing from a previous position would help a ton.

Is this something you've considered?

DannyZB avatar Aug 21 '19 06:08 DannyZB

It's an interesting feature request. It can be implemented by serializing internal librhash state into a "partly hashed" file.

But for now it's a low priority FR, so not sure when I get my hands to it.

rhash avatar Sep 27 '19 11:09 rhash

You have knowledge of the library.

Can you put 15 min and give a rundown of where that code is and what essentially should happen?

I might look into implementing it, would rather know where to look without learning the entire code base

Its very useful for download automation where you need hashing, can be split into a piped stream into rhash. Partial hashing is necessary for crashes(long downloads tend to have issues)

I.e. a way to send in the "partially hashed" file or load it after a crash. The same code can be reused to increase stability during crashes.

When you hash a 50g file and it breaks in the middle that's a little nightmare scenario

DannyZB avatar Sep 27 '19 14:09 DannyZB

see also https://stackoverflow.com/questions/2130892/persisting-hashlib-state

you really just have to save and load

  • the file size
  • the internal state of the hasher functions

milahu avatar Oct 23 '23 13:10 milahu

Since bbbe1beae95217b458ba43d4a90b7858325cca45 librhash supports add rhash_import() and rhash_export() functions to save and load its internal state. Now it's not hard to support resumable hashing of single file.

Some things are not clear:

  • What RHash should do in the case, when many files or directory trees are processed? How and where to store info, what files were already hashed? Note that RHash usually outputs hashing result to STDOUT, not to a file.
  • What to do if RHash is recursively hashing a directory tree, but, after resuming, the filesystem returns files in a different order?

rhash avatar Nov 05 '23 20:11 rhash