FastHashes icon indicating copy to clipboard operation
FastHashes copied to clipboard

Implementation of Stream Hashing

Open dan-2019 opened this issue 4 years ago • 2 comments

As far as I can see all hash functions expect an in-memory buffer. I would like to compute hashes for very large files (way beyond RAM limits) that necessitate reading smaller chunks into memory and compute the hash in steps. Is it at present possible with FastHash?

dan-2019 avatar Sep 04 '19 10:09 dan-2019

Hi! Thanks for your interest in my project. Currently, the library does not support stream hashing; it shouldn’t be a problem to integrate this functionality, but it may require some time to finish. The fastest way would be using the buily-in HashAlgorithm that already exposes stream hashing methods. It you want to help me out with this, I would really appreciate it. On a side note, I’m not sure all the algorithm can fit this logic... FarmHash for example.

TommasoBelluzzo avatar Sep 05 '19 11:09 TommasoBelluzzo

Hello, Thank you for your kind reply.

At the risk of stating the obvious, this is how I analyze things. A generic way of doing streaming hash would be to break it into:

  • Initiate hash (called once)

  • Hash(buffer, count) (called at least once, more to make any sense of streaming)

  • Finalize hash and get value (called once)

The Hash function should save its internal state (variables will differ by algorithms) at the end of each call. At the beginning of each call it starts by restoring them. Trivially the Initialize code will set the state to an initial value. Finalize will do what’s needed to produce the final hash code.

Complexity lies in saving the state at different code points. For FarmHash 128 it would be logical to enforce a buffer to be a multiple of 128 bytes. This will decrease the places where state needs to be saved.

That said, I'm not sure this cover it all and I probably look over something that makes the above unfeasible. This also explains why it's not me designing these things, just using them once cleverer folks implemented it. ;-)

dan-2019 avatar Sep 05 '19 14:09 dan-2019