humanhash Revised compress method

I made 2 changes to the "compress" method:

it will return fewer than the target number of bytes if it is given a digest that is smaller than the target size already (instead of throwing an error)
it spreads the modulo bytes around rather than dumping them all into the final byte (I think this might preserve some entropy, no?)

Apr 05 '13 23:04 lafncow

I'm maintaining a Python 3 fork of humanhash on GitHub and PyPI.

Can you add some comments to the code to explain what this is doing? And why it's better than the existing compress method? Sorry to dig this up from four years ago...

Apr 12 '17 07:04 blag

Happy to resurrect this! Compression method comments are added.

Why is this better? The old method divided the bytes into the target number of segments and after even division, placed all remainder bytes into the final segment. This meant that the effect of the remainder bytes on overall entropy was confined to the final byte. In the new method, the remainder bytes are selected throughout the input bytes and are distributed evenly among the target segments, allowing them to express more entropy. The compression per input byte is more even, since the biggest difference in the number of input bytes per output byte is 1.

For example:

compress_old([123,456,798,147], 4)
# -> [123, 456, 789, 147]
compress_old([123,456,789,147,258,369,321],4)
# -> [123, 456, 789, 417] (only the last byte has changed)

compress_new([123,456,798,147], 4)
# -> [123, 456, 789, 147]
compress_new([123,456,789,147,258,369,321],4)
# -> [435, 902, 115, 321] (all 4 bytes have changed)

As an aside, I have an equivalent compress method prepared for the Javascript port and I will create a pull request there if this is merged.

Thanks!

May 09 '17 18:05 lafncow

I'm maintaining the humanhash3 PyPI package, and if you can create a PR to my repo I'd be happy to merge it in. Thanks for the explanation! 😄

May 09 '17 22:05 blag

humanhash humanhash copied to clipboard

Revised compress method

humanhash
humanhash copied to clipboard