liboqs
liboqs copied to clipboard
Fixing OQS ARM inconsistencies
This PR is focused on resolving the inconsistencies mentioned in #1304.
- [X] C_OR_NI_OR_ARM not working in distributable builds on ARM.
- [ ] Are the OQS_USE_ARM_*_INSTRUCTIONS flags actually being set?
- [X] Update
tests/system_info.c
macros for AES and SHA2 implementation selection to match those used insrc/common/aes/aes.c
and/src/common/sha2/sha2.c
- [X] ARM profiling scripts using cross-compilation toolchains
- [ ] Sanity check which optimizations are actually being used
Currently I believe that I have resolved points 1 and 3, and still need to look into 2. @baentsch, Douglas mentioned you may have resolved the profiling scripts issue, is that the case?
A couple of notes:
- I have reformatted the AES C_OR_NI_OR_ARM macros to match the formatting of the ones we use in SHA2. I have also renamed the sha2_ni.c to sha2_armv8.c after realizing that ni didn't strand for 'native instructions'.
- We were never using the AES optimization for distributed builds. When I fixed the issue with the macro, we had linking issues which are now resolved. This could have possibly been the issue @dstebila noticed in point 2?
- I still need to verify that this does indeed fix everything, but wanted to create a draft PR in case there is anything to discuss.
@baentsch, Douglas mentioned you may have resolved the profiling scripts issue, is that the case?
If you are referring to item 4 in https://github.com/open-quantum-safe/liboqs/issues/1304#issue-1372236460 (?), then I have only eliminated the use of a toolchain-file/cross-build for ARM with https://github.com/open-quantum-safe/profiling/pull/81 to build natively (on aarch64
for aarch64
). Whether this fully resolved that item, i.e., then properly honors OQS_OPT_TARGET=auto
to use all optimizations I did not check as I didn't check performance before and after.... Doing this now (looking at https://openquantumsafe.org/benchmarking/visualization/speed_kem.html) I don't see any significant changes in "aarch64" numbers, so I'm not sure anything was really improved by this. But then again, I don't know which actual ARM64 optimizations the AWS platforms have on which we run the code and what --if any-- improvements would/should have been expected. You probably are better able to judge from the feature sets (e.g., checking for "aarch64" in the raw run files such as https://openquantumsafe.org/benchmarking/visualization/2022-09-17/speed_kem.json) which differences between the run types (baseline (i.e., "distributable"), "ref" (i.e., no optimizations on) and "noport" (i.e., all optimizations on) should be expected given the documented CPU feature sets.
I have gone through with gdb and verified that arm optimized version of aes and sha are being used appropriately. I think this is ready for review.