folly Optimize crc32 & crc32c on NVIDIA Grace

This pull request adds hardware accelerated routines for CRC32 and CRC32C for Arm AARCH64 CPUs. The changes here have been tested on NVIDIA Grace. In detail, it contains routines for:

Computing CRC32 and CRC32C hashes on dataset using the CRC intrinsics. On Grace/Neoverse V2, this can process 8 bytes/cycle.
A vectorized implementation of the gf_multiply_crc32c_hw and gf_multiply_crc32_hw functions used in routines to merge partial CRC checksums. These functions are more or less a 1:1 translation of the x86 vectorized routines.
I've introduced feature flags for AES, and SHA extensions for Arm CPUs. The feature checks for the vectorized functions are a bit more messy than on x86 because CPUs can implement a subset of these extensions.

This should resolve issue #2027.

May 16 '24 13:05 krenzland

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

May 16 '24 20:05 facebook-github-bot

Thanks for the review! I forgot to add that this should be compiled with the flags python3 build/fbcode_builder/getdeps.py --allow-system-packages build --extra-cmake-defines '{"CMAKE_CXX_FLAGS": "-march=armv8.5-a+crc+crypto"}' or similar (+crypto could be replaced by +aes+sha2?) to enable all required features.

May 21 '24 12:05 krenzland

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

May 22 '24 18:05 facebook-github-bot

@krenzland Hey after internal discussions, we would like to request to move your contributions under folly/external/nvidia-crc32 to have a more defined copyright lines, you can still define them under folly namespace.

AFAIU, CMake should automatically pick them up, as we have auto_source with recurse.

Thanks in advanced

Jul 08 '24 22:07 meteorfox

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Aug 20 '24 20:08 facebook-github-bot

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Aug 28 '24 18:08 facebook-github-bot

@r1mikey merged this pull request in facebook/folly@8fc0e33470c2611973faa5f3abe8e6bc9845aaab.

Sep 26 '24 17:09 facebook-github-bot

folly folly copied to clipboard

Optimize crc32 & crc32c on NVIDIA Grace

folly
folly copied to clipboard