liboqs icon indicating copy to clipboard operation
liboqs copied to clipboard

Integrate Kyber from libjade

Open praveksharma opened this issue 1 year ago • 9 comments

This PR integrates Kyber512 and Kyber768 from libjade for x86_64 with and without AVX2 optimizations as described in #1466. Once Kyber1024 is made available in libjade the updated copy_from_upstream.py script will pull and integrate that code into liboqs as well, requiring only modifications to copy_from_libjade.yml and copy_from_upstream.yml.

This PR modifies copy_from_upstream.py and adds a 'libjade' mode in addition 'copy' and 'verify. This new 'libjade' functions independent of 'copy' mode and copies code for the KEM algorithms specified in copy_from_libjade.yml and updates relevant documentation and the library's CBOM. While this additional 'libjade' copy_from_upstream infrastructure can also pull signature algorithms it would likely require minor tweaks to the various templates used by copy_from_upstream.py; since libjade doesn't provide full support for any signature algorithm at the moment, those changes are best dealt with a separate PR.

UPDATE (from comment below):

At the moment, I've turned off weekly constant time tests when building with -DOQS_LIBJADE_BUILD=ON. Jasmin compiler doesn't include debug information in the x86 assembly it produces so Valgrind is unable to resolve specific function names and locations in source. I've attached output from memcheck for reference: Kyber768.txt Kyber512.txt

There is a closed PR in the Jasmin lang repo which added a feature to include DWARF2 information in the generated assembly but this update is not included in the latest release. Once merged, I can create an issue to track this and update the versions of Jasmin compiler used in CI after that feature is released.

[No] Does this PR change the input/output behaviour of a cryptographic algorithm (i.e., does it change known answer test values)? (If so, a version bump will be required from x.y.z to x.(y+1).0.) [No] Does this PR change the list of algorithms available -- either adding, removing, or renaming? Does this PR otherwise change an API? (If so, PRs in fully supported downstream projects dependent on these, i.e., oqs-provider will also need to be ready for review and merge by the time this is merged.)

praveksharma avatar Apr 03 '24 16:04 praveksharma

The CBOM generation script doesn't account for the BOM including multiple implementations of the same algorithm targeting the same platform. I've tweaked the script to include additional information about the implementation in the bom-ref for components pulled from libjade to avoid clashes (see /docs/cbom.json). Since the Cyclone DX CBOM guide only list a uniqueness criterion that any component's bom-ref must satisfy I think this should be okay; I'd appreciate your thoughts on this @bhess.

praveksharma avatar Apr 09 '24 21:04 praveksharma

The CBOM generation script doesn't account for the BOM including multiple implementations of the same algorithm targeting the same platform. I've tweaked the script to include additional information about the implementation in the bom-ref for components pulled from libjade to avoid clashes (see /docs/cbom.json). Since the Cyclone DX CBOM guide only list a uniqueness criterion that any component's bom-ref must satisfy I think this should be okay; I'd appreciate your thoughts on this @bhess.

I agree @praveksharma, for this use case this should be fine, only the bom-ref needs to be unique. Thanks for updating the script. I've also created issue #1753 since the CBOM spec was just merged to the official CycloneDX 1.6 release.

Do I understand it correctly that the libjade version of Kyber would co-exist with the pq-crystals implementation, but implementing the same algorithm for the same target? Is there a use case to have both implementations in liboqs?

bhess avatar Apr 10 '24 07:04 bhess

Do I understand it correctly that the libjade version of Kyber would co-exist with the pq-crystals implementation, but implementing the same algorithm for the same target? Is there a use case to have both implementations in liboqs?

This is partially correct. Unfortunately, the jasmin compiler produces AT&T assembly so libjade code isn't compatible with Microsoft Visual C++ which uses it's own assembly format. So liboqs would need to have PQClean ref code in addition to libjade's ref and avx2 code. The only redundant implementation is PQClean avx2.

At the moment, liboqs would also need to include Kyber1024 PQClean avx2 since libjade doesn't include Kyber1024. This would lead to a mixed set of implementations for Kyber:

  1. Kyber512: PQClean ref, libjade ref, libjade avx2
  2. Kyber768: PQClean ref, libjade ref, libjade avx2
  3. Kyber1024: PQClean ref, PQClean avx2

My thought was that this might lead to additional maintenance burden for an algorithm that we might want to drop in favour of ML-KEM soon (this summer?) when we don't need Kyber for CIRCL interop.

Aside from Kyber, here are some half-formed thoughts as to why we may want to include regular C99 implementations for algorithms in addition to formally verified assembly implementations from libjade (or similar projects):

  1. My understanding is that the formal verification process leads to a larger turnaround between changing jasmin code and having provably correct/secure assembly. In the event of a vulnerability it may be quicker to patch C99 code (either in liboqs or accept patches from upstream) than to wait for the formally verified code to be fixed. I'm not sure how practical this concern is.
  2. There may be projects unwilling to include assembly code due organisational/compliance reasons. Again, I'm not sure how practical this concern is.

praveksharma avatar Apr 10 '24 14:04 praveksharma

The libjade code wasn't accessible due a error in one of the templates. Thank you for pointing that out @SWilson4!

At the moment, I've turned off weekly constant time tests when building with -DOQS_LIBJADE_BUILD=ON. Jasmin compiler doesn't include debug information in the x86 assembly it produces so Valgrind is unable to resolve specific function names and locations in source. I've attached output from memcheck for reference: Kyber768.txt Kyber512.txt

There is a closed PR in the Jasmin lang repo which added a feature to include DWARF2 information in the generated assembly but this update is not included in the latest release. Once merged, I can create an issue to track this and update the versions of Jasmin compiler used in CI after that feature is released.

praveksharma avatar Apr 19 '24 20:04 praveksharma

A few comments from an older review (addressed in-person with @praveksharma) slipped in. I think I've cleaned them all up; sorry for any confusion.

SWilson4 avatar Apr 22 '24 20:04 SWilson4

@baentsch @SWilson4, thank you for your reviews! I think I've addressed all of them.

Am I overlooking CI actually activating and running this code?

You weren't Michael, I've fixed that. The code is being activated by the GitHub Actions workflows but not CircleCI (specifically the ubuntu-bionic-i386 image which from what I can tell isn't being used in GitHub Actions workflows). However, the code is being tested on both Linux and Darwin which are they only platforms libjade supports at the moment.

praveksharma avatar Apr 25 '24 03:04 praveksharma

libjade has now been relicensed to be CC0 or Apache 2: https://github.com/formosa-crypto/libjade/pull/121

dstebila avatar Apr 30 '24 12:04 dstebila

@baentsch @SWilson4, thank you for your reviews! I think I've addressed all of them.

Am I overlooking CI actually activating and running this code?

You weren't Michael, I've fixed that. The code is being activated by the GitHub Actions workflows but not CircleCI (specifically the ubuntu-bionic-i386 image which from what I can tell isn't being used in GitHub Actions workflows). However, the code is being tested on both Linux and Darwin which are they only platforms libjade supports at the moment.

Thanks for this improvement. So this means that if this is merged (and activated), Linux and MacOS will run automatically run libjade Kyber code and all else (Windows, Raspberry, IBM platforms, etc.) will run reference code? If so, is this already documented? Should it be? How?

baentsch avatar May 07 '24 12:05 baentsch

As per the performance discussion from yesterday: Indeed this code is faster than what we have in the system until now, so maybe you'd like to add some marketing speak to the build option "OQS_LIBJADE_BUILD" along those lines.

$ ../build-plain/tests/speed_kem Kyber768
Configuration info
==================
Target platform:  x86_64-Linux-6.5.0-26-generic
Compiler:         gcc (11.4.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.1-dev
Git commit:       a5ec23cf19763d36a558b8358345823ae45d57e5
OpenSSL enabled:  Yes (OpenSSL 3.0.2 15 Mar 2022)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2024-05-08 08:00:58
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Kyber768                             |            |                |                 |            |                           |           
keygen                               |     246687 |          3.000 |          12.161 |      2.489 |                     23015 |       4694
encaps                               |     207357 |          3.000 |          14.468 |      0.635 |                     27388 |        754
decaps                               |     276601 |          3.000 |          10.846 |      0.408 |                     20520 |        324
Ended at 2024-05-08 08:01:07

$ ./tests/speed_kem Kyber768
Configuration info
==================
Target platform:  x86_64-Linux-6.5.0-26-generic
Compiler:         gcc (11.4.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.1-dev
Git commit:       864be05738585e975540dd3ee1095d580904517f
OpenSSL enabled:  Yes (OpenSSL 3.0.2 15 Mar 2022)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2024-05-08 08:01:16
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Kyber768                             |            |                |                 |            |                           |           
keygen                               |     297927 |          3.000 |          10.070 |      2.127 |                     19048 |       4013
encaps                               |     243501 |          3.000 |          12.320 |      0.526 |                     23318 |        495
decaps                               |     303813 |          3.000 |           9.874 |      0.398 |                     18679 |        369
Ended at 2024-05-08 08:01:25

What I'd surely recommend before merge is adding at least "OQS_LIBJADE_BUILD" to the output of the "OQS build flags" output as otherwise there's really no indication as to what explains performance differences (in the above it's just my distinction of a "plain" build directory (without libjasmin) and a current build (on branch "ps-jasmin").... Even cooler of course would be output next to each algorithm name as to which code (level, version, etc.) it's running on the given platform, thus doing away with having to read all caveats as to what is active on a specific platform -- but that's arguably quite some effort. Created #1786 to track separately.

baentsch avatar May 08 '24 06:05 baentsch

The license files have been updated and the PR is ready to merge.

praveksharma avatar Jul 17 '24 18:07 praveksharma

Merging despite Travis failing, please see #1888.

praveksharma avatar Aug 18 '24 16:08 praveksharma

FYI @praveksharma https://github.com/open-quantum-safe/liboqs/actions/runs/10441963129/workflow

SWilson4 avatar Aug 20 '24 13:08 SWilson4

@SWilson4 @praveksharma May I ask a question: Does this mean that when a GH CI job is incorrectly formatted, it simply does not run and no UI information is given so one is tempted to (and in this case did) merge even though CI doesn't run? Second question: Is there no way one can check "at a glance" whether all configured GH jobs ran (and are configured) OK (along the lines of the Travis CI badge)? Worthwhile checking back with GH?

baentsch avatar Aug 21 '24 06:08 baentsch

@SWilson4 @praveksharma May I ask a question: Does this mean that when a GH CI job is incorrectly formatted, it simply does not run and no UI information is given so one is tempted to (and in this case did) merge even though CI doesn't run? Second question: Is there no way one can check "at a glance" whether all configured GH jobs ran (and are configured) OK (along the lines of the Travis CI badge)? Worthwhile checking back with GH?

I believe it does shows up, but not as a "red" failure (simply greyed out iirc). In this case it was probably even less obvious because of the Travis failures. I think this will be improved by #1880.

SWilson4 avatar Aug 21 '24 17:08 SWilson4

I believe it does shows up, but not as a "red" failure (simply greyed out iirc). In this case it was probably even less obvious because of the Travis failures. I think this will be improved by #1880.

Never mind, it doesn't show up at all on the interface next to the merge button. :( See below for an example of a PR which inherits the same failure. The two "action required" checks are Travis failures.

Screenshot from 2024-08-21 15-19-03

SWilson4 avatar Aug 21 '24 19:08 SWilson4

Never mind, it doesn't show up at all on the interface next to the merge button.

So I thought. This is not good: We start to regularly disregard CI failures (Travis) and don't see (look for) badly configured CI jobs, too (not exactly surprising as we already seem to expect CI to fail) -- and on top violate the least privilege principle wrt controlling who can do what to a project -- and discuss this all in closed-and-merged PRs, i.e., after the fox has left the barn (or whatever the proverb is :).

I think we ought to discuss how to improve things here. Any chance we can take some minutes for this in the next status call, @dstebila ? If not, my proposal below:

  • Do not accept PRs failing any CI but rather remove deemed irrelevant CI for good, creating an issue to activate it when possible; possibly document changes thus caused in PLATFORMS.md so this is becoming visible to the community and possibly creating motivation for the community to help solve things.
  • Add testing to highlight incorrect CI scripts and do not allow PRs before such errors are fixed, in effect making them a priority
  • PRs changing GH access permissions should be reviewed by at least as many people as changes to the logic of liboqs as errors here have the potential for more damage than simple deletion of a project, namely the uncontrolled integration of code into a code base that aims to be trusted (at least in my view).

baentsch avatar Aug 22 '24 07:08 baentsch

I think we ought to discuss how to improve things here. Any chance we can take some minutes for this in the next status call, @dstebila ?

Done! https://github.com/open-quantum-safe/tsc/blob/main/oqs-status-meetings/2024-08-27%20-%20OQS%20status%20meeting%20-%20agenda.md

dstebila avatar Aug 27 '24 00:08 dstebila