Aleksey Vaneev

Results 55 comments of Aleksey Vaneev
trafficstars

You've misunderstood the concept. It's not a streamed hashing. Sequential hashing assumes length of each item is also a part of the message.

You can hash all your files using blocks of e.g. 1024 bytes (except the last block). It will be the same as streamed hashing for this given block size. Buffered...

I've implemented the streamed hashing after all, please check it out. It turned out to be a bit faster than the base `komihash()` function.

Code length is rarely a factor. Branching or the number of "pivot points" to reach the result matters most in hash functions.

> Code length is a huge factor in most use cases. Esp. Hash tables. Have you actually did any comparisons of attribute `always_inline` vs just `inline`? Compilers are extremely selective...

LZMA should be good, it's freely available open source library, can be integrated without much hassle. LZMA is also known as one of the best available compression algorithms, it's very...

Just a note - it looks like neural network-based compressors work great for text compression, but I do not think they'll handle "random data" compression well.

NN compressors are language-based models mainly. LZMA on the other hand works with bit-level patterns and is Markov chain based, it's not "linear repeatability".

I also pointed to this bug a long time ago. Reini agreed to use ceil(collrate*4) as a "pass" condition (2), but seems never introduced it. Getting 2 collisions in this...

I agree that Poisson distribution may be a good match. 3.3% probability of an event of 2 collisions is not over the place.