Direct API for SSE-optimized crc32
Hi Mark,
Thanks for a great project. I'm looking for efficient crc32 implementation for Intel 64.
I see that zlib has great implementation of algorithm based on PCLMULQDQ instruction.
However it is only used in deflate (crc_fold_copy). Generic crc32 computation function uses pure C table-based implementation.
Any specific reason why crc32 uses "unoptimised" version? Any way to change this in upstream or should I just branch zlib to implement fast crc32 based on crc_fold_copy?
Thanks, Vlad
We already have submissions for PCLMULQDQ-based CRCs. zlib will eventually include that.
Are these submissions publicly available? I didn't find anything relevant in pull requests.
non-sse related - @madler would you be willing to accept an appropriately packaged https://github.com/antonblanchard/crc32-vpmsum ?
@grooverdan I am not answering for madler here, but one problem I see is that it is GPL licensed, that is not really that compatible with the zlib license. If that can actually be changed, then I would at least be interested in looking into it for our fork named zlib-ng.
From a very cursory look, it seems like a lot of code, and a good deal of it asm. I personally prefer compiler intrinsics if possible/feasible, since it can often be easier to maintain the code. (This might not be true for zlib though, due to different portability requirements). We also have an IBM canberra employee called 'daxtens' that has been looking into contributing power-optimalizations and a unit-testing framework, you might know him.
Thanks @Dead2. Licensing isn't too much an issue as its IBM code and I've got the contacts to change the license per project if needed. I did look at your zlib-ng. Thanks for your guidance. I certainly know @daxtens and will check on his progress/priorities.
Yep, so I was working on zlib-ng and got sidetracked by everything else going on. I'm still keen on getting the unit tests and also refactoring the CRC32 stuff in a pluggable way. I still think is a good long-term project especially for zlib-ng.
In the short run, we've also recently made crc32-vpmsum dual licensed under gpl and apache2. I don't know if apache2 is zlib-compatible but we've at least shown that we can re-license it, so another one shouldn't be too hard.
With regards to compiler intrinsics, there isn't really support for doing this sort of stuff with intrinsics; there just aren't ones for ppc stuff in the compiler. (This is the downside of working on a smaller platform!) We'd probably just want to import the code, tweak the wrappers to get the right propagation of const, and then add the CPU feature detection stuff.
This issue is still open, are there still open submissions for PCLMULQDQ-based CRC32? Would you accept a new one?