zlib icon indicating copy to clipboard operation
zlib copied to clipboard

Direct API for SSE-optimized crc32

Open w1ldptr opened this issue 10 years ago • 7 comments

Hi Mark,

Thanks for a great project. I'm looking for efficient crc32 implementation for Intel 64. I see that zlib has great implementation of algorithm based on PCLMULQDQ instruction. However it is only used in deflate (crc_fold_copy). Generic crc32 computation function uses pure C table-based implementation.

Any specific reason why crc32 uses "unoptimised" version? Any way to change this in upstream or should I just branch zlib to implement fast crc32 based on crc_fold_copy?

Thanks, Vlad

w1ldptr avatar Jun 18 '15 21:06 w1ldptr

We already have submissions for PCLMULQDQ-based CRCs. zlib will eventually include that.

madler avatar Jun 18 '15 21:06 madler

Are these submissions publicly available? I didn't find anything relevant in pull requests.

w1ldptr avatar Jun 18 '15 21:06 w1ldptr

non-sse related - @madler would you be willing to accept an appropriately packaged https://github.com/antonblanchard/crc32-vpmsum ?

grooverdan avatar Dec 16 '15 03:12 grooverdan

@grooverdan I am not answering for madler here, but one problem I see is that it is GPL licensed, that is not really that compatible with the zlib license. If that can actually be changed, then I would at least be interested in looking into it for our fork named zlib-ng.

From a very cursory look, it seems like a lot of code, and a good deal of it asm. I personally prefer compiler intrinsics if possible/feasible, since it can often be easier to maintain the code. (This might not be true for zlib though, due to different portability requirements). We also have an IBM canberra employee called 'daxtens' that has been looking into contributing power-optimalizations and a unit-testing framework, you might know him.

Dead2 avatar Dec 16 '15 11:12 Dead2

Thanks @Dead2. Licensing isn't too much an issue as its IBM code and I've got the contacts to change the license per project if needed. I did look at your zlib-ng. Thanks for your guidance. I certainly know @daxtens and will check on his progress/priorities.

grooverdan avatar Dec 16 '15 23:12 grooverdan

Yep, so I was working on zlib-ng and got sidetracked by everything else going on. I'm still keen on getting the unit tests and also refactoring the CRC32 stuff in a pluggable way. I still think is a good long-term project especially for zlib-ng.

In the short run, we've also recently made crc32-vpmsum dual licensed under gpl and apache2. I don't know if apache2 is zlib-compatible but we've at least shown that we can re-license it, so another one shouldn't be too hard.

With regards to compiler intrinsics, there isn't really support for doing this sort of stuff with intrinsics; there just aren't ones for ppc stuff in the compiler. (This is the downside of working on a smaller platform!) We'd probably just want to import the code, tweak the wrappers to get the right propagation of const, and then add the CPU feature detection stuff.

daxtens avatar Dec 17 '15 01:12 daxtens

This issue is still open, are there still open submissions for PCLMULQDQ-based CRC32? Would you accept a new one?

samrussell avatar Oct 15 '24 07:10 samrussell