liboqs icon indicating copy to clipboard operation
liboqs copied to clipboard

Fixing OQS ARM inconsistencies

Open Martyrshot opened this issue 2 years ago • 1 comments

This PR is focused on resolving the inconsistencies mentioned in #1304.

  • [X] C_OR_NI_OR_ARM not working in distributable builds on ARM.
  • [ ] Are the OQS_USE_ARM_*_INSTRUCTIONS flags actually being set?
  • [X] Update tests/system_info.c macros for AES and SHA2 implementation selection to match those used in src/common/aes/aes.c and /src/common/sha2/sha2.c
  • [X] ARM profiling scripts using cross-compilation toolchains
  • [ ] Sanity check which optimizations are actually being used

Currently I believe that I have resolved points 1 and 3, and still need to look into 2. @baentsch, Douglas mentioned you may have resolved the profiling scripts issue, is that the case?

A couple of notes:

  • I have reformatted the AES C_OR_NI_OR_ARM macros to match the formatting of the ones we use in SHA2. I have also renamed the sha2_ni.c to sha2_armv8.c after realizing that ni didn't strand for 'native instructions'.
  • We were never using the AES optimization for distributed builds. When I fixed the issue with the macro, we had linking issues which are now resolved. This could have possibly been the issue @dstebila noticed in point 2?
  • I still need to verify that this does indeed fix everything, but wanted to create a draft PR in case there is anything to discuss.

Martyrshot avatar Sep 24 '22 16:09 Martyrshot

@baentsch, Douglas mentioned you may have resolved the profiling scripts issue, is that the case?

If you are referring to item 4 in https://github.com/open-quantum-safe/liboqs/issues/1304#issue-1372236460 (?), then I have only eliminated the use of a toolchain-file/cross-build for ARM with https://github.com/open-quantum-safe/profiling/pull/81 to build natively (on aarch64 for aarch64). Whether this fully resolved that item, i.e., then properly honors OQS_OPT_TARGET=auto to use all optimizations I did not check as I didn't check performance before and after.... Doing this now (looking at https://openquantumsafe.org/benchmarking/visualization/speed_kem.html) I don't see any significant changes in "aarch64" numbers, so I'm not sure anything was really improved by this. But then again, I don't know which actual ARM64 optimizations the AWS platforms have on which we run the code and what --if any-- improvements would/should have been expected. You probably are better able to judge from the feature sets (e.g., checking for "aarch64" in the raw run files such as https://openquantumsafe.org/benchmarking/visualization/2022-09-17/speed_kem.json) which differences between the run types (baseline (i.e., "distributable"), "ref" (i.e., no optimizations on) and "noport" (i.e., all optimizations on) should be expected given the documented CPU feature sets.

baentsch avatar Sep 25 '22 06:09 baentsch

I have gone through with gdb and verified that arm optimized version of aes and sha are being used appropriately. I think this is ready for review.

Martyrshot avatar Nov 15 '22 19:11 Martyrshot