smhasher icon indicating copy to clipboard operation
smhasher copied to clipboard

256-bit variant MurMurMurMur

Open DonaldTsang opened this issue 6 years ago • 1 comments

Will there ever be a 256-bit version of MurMur for avoiding hash collisions in larger datasets?

DonaldTsang avatar Sep 21 '19 09:09 DonaldTsang

I guess it would then not as much deviate from SHA-256, according to performance and distribution quality (note I said SHA, not SHA3). Anyway the performance comparison of both looks like:

MethodCalculate over 32 bytesCalculate over 1KB
PerformanceRatePerformanceRate
MM3-1280.236128 µs/#4234986 #/sec0.462609 µs/#2161655 #/sec
SHA-2560.487272 µs/#2052241 #/sec3.917825 µs/#255243 #/sec

Why I think so - because too many nested shifts/mixes expected by interim calculations on (non-native) 256-bit "integers". This is comparable to use MM64-128 compiled for 32-bit systems (due to non-native or missing 128-bit int's there) - just try to compare by yourself, you'd see similar performance "degradation".

Note that MM3 is good cascade-able (if we're talking about not crypto-strong hashing, to avoid collisions only)... thus combining N hashes you'll be able to build hash with sizes 256, 384, 512, etc bits. So you can simply try to use 2-blocks MM3-128 for hashing one by one the blocks of large strings each 32 bytes (128 bits): so calculate first 32-bytes block of string into one hash, second into another, 3rd again into first hash, etc, then you'd reach almost the same performance (comparable with 0.46µs for 1KB like above), and your both 128-bits hashes representing one 256-bits hash.

sebres avatar Sep 23 '19 16:09 sebres