digest-crc
digest-crc copied to clipboard
Would it make sense to have comparative benchmarks against other digests?
Would it make sense to have benchmarks for one or a few of the ruby built in digests as well, so that there is some ballpark comparison number?
Interesting idea. I guess it could show that Digest::CRC* isn't any slower than Digest::MD5, etc, which are implemented by libssl. Although, I would wager the CRC algorithms will be faster than other cryptographic digests, since CRC has less arithmetic operations. I also do not want to potentially encourage people to use CRC digests instead of cryptographic hashes (which should guarantee no collisions), just because CRC is simply faster.
Right. Though as md5 is not secure anymore, that should be fine. I tried to just add Digest::MD5 to the benchmark script, but the results surprised me as md5 seems faster. I wonder if someone spent way too much time optimizing that one..
The simple lookup table algorithm used in digest-crc is not very fast for modern general-purpose processors. Faster ones are in practical use.
For information:
- https://create.stephan-brumme.com/crc32/
- https://github.com/htot/crc32c#benchmarks ("crc32c sarwate" is equivalent to the algorithm used in digest-crc)
- Previously zlib 1.2.11 and liblzma used "Slicing by 4" or "Slicing by 8" as announced by Intel in 2005. https://github.com/madler/zlib/blob/v1.2.11/crc32.c#L247 https://github.com/tukaani-project/xz/blob/v5.4.3/src/liblzma/check/crc32_fast.c
- zlib 1.2.12 and later are currently parallelized by interleaving (outside of my understanding).
https://github.com/madler/zlib/blob/v1.2.13/crc32.c#L748
The commit message mentions a 3x speedup from 1.5。
Maybe https://github.com/google/crc32c/blob/1.1.2/src/crc32c_portable.cc is a fixed version of
crc32()in zlib 1.2.12 byN = 4, W = 4. - Some systems use the x86 CLMUL extension instruction, which allows even faster operations on the Galois field if available, and ARM has a similar extension instruction (outside my understanding). https://en.wikipedia.org/wiki/CLMUL_instruction_set https://github.com/intel/soft-crc/blob/b98ee8a444b0da33a3ea09c6e9c8fee4f664821d/crc.h#L361 (archived repository) https://github.com/tukaani-project/xz/blob/v5.4.3/src/liblzma/check/crc64_fast.c#L139 (CRC-64 model) https://github.com/ebiggers/libdeflate/blob/v1.18/lib/x86/crc32_pclmul_template.h#L125 https://github.com/ebiggers/libdeflate/blob/v1.18/lib/arm/crc32_pmull_wide.h#L55
My crc-turbo uses the "Slicing by 16" variant, but the zlib crc32() built into FreeBSD is about 1.6 times faster.
Also, lzma_crc64() by liblzma is about 3.6 times faster than my crc-turbo.
:thinking: I may write an acceleration patch for digest-crc. However, I cannot handle CLMUL extension instructions, so it will be equivalent to "Slicing by 16" or "Slicing by 8".