circl
circl copied to clipboard
Implement Classic McEliece
This is a draft of a Classic McEliece that implements all the parameters specified in the round 3 submission. It passes the KAT tests provided in the official website.
The current major problem right now is that public key generation takes so much time it causes tests on CI to timeout and fail. This is because the Gaussian Elimination loop inside pkGen
in pk_gen.go
can take millions of iterations for larger parameters. Vectorization should be able to make it faster by like 30x but I haven't implemented it yet. Currently there are only generic Go implementations.
Some test data outside of the KAT test are lifted from a Rust implementation. These test data are to ensure that some specific functions work correctly.
Closes #360
@octeep thanks for pushing this. Also, could you please squash your commits.
We will review this code closely.
Definitely key generation takes lots of time (ARM64 tests timed out, while x64's not).
What about adding a build constraint to compile this code only on x64?
What do you think @bwesterb ?
What do you think @bwesterb ?
I'll have a look tomorrow.
@octeep thanks for pushing this. Also, could you please squash your commits.
We will review this code closely.
Definitely key generation takes lots of time (ARM64 tests timed out, while x64's not).
What about adding a build constraint to compile this code only on x64?
What do you think @bwesterb ?
I've squashed my commit. This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon (See this). It might be better to wait for them to publish the round 4 so this PR can be updated accordingly.
Thanks for this @octeep
This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon
Ok.
Can I ask you how did this implementation came to be: is it a fresh implementation from the spec or did you translate an existing implementation (ref
or rust)?
I had an initial look today. To set expectations: it will of course take quite some time to carefully review this.
Some initial comments:
- Key generation is very slow and I confirmed it's mostly in the Gaussian elimination. It's much slower than the C reference implementation. How much experience do you have with optimising for Go? If you're new to it: the Go compiler is very dumb. They favour compilation speed against optimisation potential. Dumb things like unrolling loops can make a big difference. I think an optimised Gaussian elimination might be worthwhile.
- Have you given thought to organising the types/code in such a way that it's easy to add in SIMD optimisations?
- Having megabytes of test data is unworkable. Please reduce that to a few or even a single hash.
- The comments on the code aren't terrible, definitely compared to most crypto code, but we'd like to have a high standard.
Thanks for this @octeep
This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon
Ok.
Can I ask you how did this implementation came to be: is it a fresh implementation from the spec or did you translate an existing implementation (
ref
or rust)?
Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.
I had an initial look today. To set expectations: it will of course take quite some time to carefully review this.
Some initial comments:
- Key generation is very slow and I confirmed it's mostly in the Gaussian elimination. It's much slower than the C reference implementation. How much experience do you have with optimising for Go? If you're new to it: the Go compiler is very dumb. They favour compilation speed against optimisation potential. Dumb things like unrolling loops can make a big difference. I think an optimised Gaussian elimination might be worthwhile.
I unfortunately do not have much experience with Go optimization. I also want to note that when I compile the C optimized implementation with vectorization explicitly disabled, the performance is only slightly faster than Go. When I compile the C optimized implementation and tell gcc to show the optimization tree, I see that gcc has performed vectorization automatically, which is why it is much faster than Go.
The vec
implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?
- Have you given thought to organising the types/code in such a way that it's easy to add in SIMD optimisations?
If we are talking about optimizing pk_gen
specifically then I don't think it would be too hard to add SIMD optimizations. Just add some build flags in the pk_gen
template and some slight refactoring should be enough.
If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.
- Having megabytes of test data is unworkable. Please reduce that to a few or even a single hash.
I will look into that. I don't think reducing it to a few hashes can work since the test cases contain sample inputs. Maybe we can compress it or represent these inputs in raw bytes rather than as hexadecimal ASCII.
- The comments on the code aren't terrible, definitely compared to most crypto code, but we'd like to have a high standard.
I will try to document the code to the best of my understanding.
I have managed to trim down testdata.txt
to 90kb. Would this be acceptable?
Thanks again for your efforts!
Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.
Ok! Please add appropriate attributions.
The
vec
implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?
Yes, that's a great plan!
(For those reading along: vec
packs a few field elements into uint64
s.)
If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.
I'll get back to this when I found time to properly study the code.
I will try to document the code to the best of my understanding.
Great! If you're unsure about certain parts, please leave a comment as such for us to fill in.
I have managed to trim down
testdata.txt
to 90kb. Would this be acceptable?
@armfazh ?
I have managed to trim down
testdata.txt
to 90kb. Would this be acceptable?
After bzip2-ing that file, size gets down to 50 KB, which I think is acceptable.
As a general comment, please introduce // TODO
comments in places where code optimization is needed, such as better number representation or SIMD.
They might be not resolved in this PR, but flagging them is good for further development.
Thanks again for your efforts!
Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.
Ok! Please add appropriate attributions.
I'm not sure how I should add attributions here. I have added a notice on each mceliece.go
saying that some code are translated from the Rust implementation, and included the author's name and github link. Do I need to do more than that? (e.g. include their MIT license)
The
vec
implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?Yes, that's a great plan!
I am trying to do that in the vec
branch in my fork repo. I can't promise if I will ever finish that but you can track the progress there if interested.
(For those reading along:
vec
packs a few field elements intouint64
s.)If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.
I'll get back to this when I found time to properly study the code.
I will try to document the code to the best of my understanding.
Great! If you're unsure about certain parts, please leave a comment as such for us to fill in.
Ok! Please add appropriate attributions.
I have added a notice on each
mceliece.go
saying that some code are translated from the Rust implementation, and included the author's name and github link.
That should be enough for now.
I am trying to do that in the
vec
branch in my fork repo. I can't promise if I will ever finish that but you can track the progress there if interested.
Cool.
Ok good news: I managed to port the vec
implementation of pk_gen
to go and tested it on mceliece6960119. The KAT test now went from 120s to 17s. Still not as fast as the C implementation but it is much better than the current Go one.
Ok good news: I managed to port the
vec
implementation ofpk_gen
to go and tested it on mceliece6960119. The KAT test now went from 120s to 17s. Still not as fast as the C implementation but it is much better than the current Go one.
Good job!
I've ported the vec
implementation for all the systematic parameters (parameters that don't end with a f
). Right now amd64 and MacOS implementations finish on CI finishes within 3 minutes, but I still can't get it to finish within 10 minutes on arm64.
After porting the vec
implementation to semi-systematic parameters it should be able to finish within 10 minutes.
I've ported the
vec
implementation for all the systematic parameters (parameters that don't end with af
). Right now amd64 and MacOS implementations finish on CI finishes within 3 minutes, but I still can't get it to finish within 10 minutes on arm64.After porting the
vec
implementation to semi-systematic parameters it should be able to finish within 10 minutes.
I've now also ported vec
for semi-systematic parameters. On arm64 CI it now completes under 10 minutes so it passes CI now.
The round 4 version has been updated, see https://classic.mceliece.org/nist.html
The round 4 version has been updated, see https://classic.mceliece.org/nist.html
I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.
Hello :wave: , just wanted to say this change is desired, and that I hope code review proceeds in the near future! :rocket:
Hello, just wanted to inquire if there's been any movement on this! Would love to see this get reviewed soon.
Hi, I will continue with the review of this PR in the next couple days. Thanks for your patience.
Any updates on this one?
I am also really interested in this PR as I am planning to use this for the Go implementation of the WireGuard Rosenpass Key Exchange.
I noticed that the current implementation differs from liboqs with respect to the ciphertext length. See the following note:
https://classic.mceliece.org/mceliece-pc-20221023.pdf
I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.
The file headers are still mentioning round 3:
// Package mceliece348864 implements the IND-CCA2 secure key encapsulation mechanism
// mceliece348864 as submitted to round 3 of the NIST PQC competition and
// described in
@armfazh Can we remove the on-hold label from the issue? Round 4 specs have been released. There is nothing blocking this from getting finished :)
https://classic.mceliece.org/nist.html
I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.
The file headers are still mentioning round 3:
// Package mceliece348864 implements the IND-CCA2 secure key encapsulation mechanism // mceliece348864 as submitted to round 3 of the NIST PQC competition and // described in
Thanks for pointing that out, I've fixed that in a new commit.
I noticed that the current implementation differs from liboqs with respect to the ciphertext length. See the following note:
https://classic.mceliece.org/mceliece-pc-20221023.pdf
It would appear that liboqs is still using the round 3 specification which explains the difference in ciphertext length. Mine follows the round 4 specification and the produced ciphertexts are checked against the KAT tests provided by the official Classic McEliece website.
Hi @bwesterb, @pufferffish, @armfazh,
is there something I can do to support you in getting this implementation merged? Like code-review, testing or something similar?
Thanks :)
@stv0g we appreciate a code review, our team will be providing feedback as we get more bandwidth.
@stv0g thanks for the review. It's been a while since I've worked on this and recently I've been busy with work so I can't promise I can get back to your reviews soon. If you'd like to I can give you write access to my fork repo so you can directly work on this pull request.