circl Implement Classic McEliece

This is a draft of a Classic McEliece that implements all the parameters specified in the round 3 submission. It passes the KAT tests provided in the official website.

The current major problem right now is that public key generation takes so much time it causes tests on CI to timeout and fail. This is because the Gaussian Elimination loop inside pkGen in pk_gen.go can take millions of iterations for larger parameters. Vectorization should be able to make it faster by like 30x but I haven't implemented it yet. Currently there are only generic Go implementations.

Some test data outside of the KAT test are lifted from a Rust implementation. These test data are to ensure that some specific functions work correctly.

Closes #360

Oct 04 '22 03:10 whyvl

@octeep thanks for pushing this. Also, could you please squash your commits.

We will review this code closely.

Definitely key generation takes lots of time (ARM64 tests timed out, while x64's not).

What about adding a build constraint to compile this code only on x64?

What do you think @bwesterb ?

Oct 10 '22 21:10 armfazh

What do you think @bwesterb ?

I'll have a look tomorrow.

Oct 10 '22 21:10 bwesterb

@octeep thanks for pushing this. Also, could you please squash your commits.

We will review this code closely.

Definitely key generation takes lots of time (ARM64 tests timed out, while x64's not).

What about adding a build constraint to compile this code only on x64?

What do you think @bwesterb ?

I've squashed my commit. This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon (See this). It might be better to wait for them to publish the round 4 so this PR can be updated accordingly.

Oct 10 '22 21:10 whyvl

Thanks for this @octeep

This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon

Ok.

Can I ask you how did this implementation came to be: is it a fresh implementation from the spec or did you translate an existing implementation (ref or rust)?

I had an initial look today. To set expectations: it will of course take quite some time to carefully review this.

Some initial comments:

Key generation is very slow and I confirmed it's mostly in the Gaussian elimination. It's much slower than the C reference implementation. How much experience do you have with optimising for Go? If you're new to it: the Go compiler is very dumb. They favour compilation speed against optimisation potential. Dumb things like unrolling loops can make a big difference. I think an optimised Gaussian elimination might be worthwhile.
Have you given thought to organising the types/code in such a way that it's easy to add in SIMD optimisations?
Having megabytes of test data is unworkable. Please reduce that to a few or even a single hash.
The comments on the code aren't terrible, definitely compared to most crypto code, but we'd like to have a high standard.

Oct 11 '22 22:10 bwesterb

Thanks for this @octeep

This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon

Ok.

Can I ask you how did this implementation came to be: is it a fresh implementation from the spec or did you translate an existing implementation (ref or rust)?

Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.

I had an initial look today. To set expectations: it will of course take quite some time to carefully review this.

Some initial comments:

Key generation is very slow and I confirmed it's mostly in the Gaussian elimination. It's much slower than the C reference implementation. How much experience do you have with optimising for Go? If you're new to it: the Go compiler is very dumb. They favour compilation speed against optimisation potential. Dumb things like unrolling loops can make a big difference. I think an optimised Gaussian elimination might be worthwhile.

I unfortunately do not have much experience with Go optimization. I also want to note that when I compile the C optimized implementation with vectorization explicitly disabled, the performance is only slightly faster than Go. When I compile the C optimized implementation and tell gcc to show the optimization tree, I see that gcc has performed vectorization automatically, which is why it is much faster than Go.

The vec implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?

Have you given thought to organising the types/code in such a way that it's easy to add in SIMD optimisations?

If we are talking about optimizing pk_gen specifically then I don't think it would be too hard to add SIMD optimizations. Just add some build flags in the pk_gen template and some slight refactoring should be enough.

If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.

Having megabytes of test data is unworkable. Please reduce that to a few or even a single hash.

I will look into that. I don't think reducing it to a few hashes can work since the test cases contain sample inputs. Maybe we can compress it or represent these inputs in raw bytes rather than as hexadecimal ASCII.

The comments on the code aren't terrible, definitely compared to most crypto code, but we'd like to have a high standard.

I will try to document the code to the best of my understanding.

Oct 11 '22 23:10 whyvl

I have managed to trim down testdata.txt to 90kb. Would this be acceptable?

Oct 12 '22 01:10 whyvl

Thanks again for your efforts!

Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.

Ok! Please add appropriate attributions.

The vec implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?

Yes, that's a great plan!

(For those reading along: vec packs a few field elements into uint64s.)

If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.

I'll get back to this when I found time to properly study the code.

I will try to document the code to the best of my understanding.

Great! If you're unsure about certain parts, please leave a comment as such for us to fill in.

Oct 12 '22 13:10 bwesterb

I have managed to trim down testdata.txt to 90kb. Would this be acceptable?

@armfazh ?

Oct 12 '22 13:10 bwesterb

I have managed to trim down testdata.txt to 90kb. Would this be acceptable?

After bzip2-ing that file, size gets down to 50 KB, which I think is acceptable.

Oct 12 '22 18:10 armfazh

As a general comment, please introduce // TODO comments in places where code optimization is needed, such as better number representation or SIMD. They might be not resolved in this PR, but flagging them is good for further development.

Oct 12 '22 18:10 armfazh

Thanks again for your efforts!

Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.

Ok! Please add appropriate attributions.

I'm not sure how I should add attributions here. I have added a notice on each mceliece.go saying that some code are translated from the Rust implementation, and included the author's name and github link. Do I need to do more than that? (e.g. include their MIT license)

The vec implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?

Yes, that's a great plan!

I am trying to do that in the vec branch in my fork repo. I can't promise if I will ever finish that but you can track the progress there if interested.

(For those reading along: vec packs a few field elements into uint64s.)

If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.

I'll get back to this when I found time to properly study the code.

I will try to document the code to the best of my understanding.

Great! If you're unsure about certain parts, please leave a comment as such for us to fill in.

Oct 13 '22 05:10 whyvl

Ok! Please add appropriate attributions.

I have added a notice on each mceliece.go saying that some code are translated from the Rust implementation, and included the author's name and github link.

That should be enough for now.

I am trying to do that in the vec branch in my fork repo. I can't promise if I will ever finish that but you can track the progress there if interested.

Cool.

Oct 13 '22 07:10 bwesterb

Ok good news: I managed to port the vec implementation of pk_gen to go and tested it on mceliece6960119. The KAT test now went from 120s to 17s. Still not as fast as the C implementation but it is much better than the current Go one.

Oct 13 '22 08:10 whyvl

Ok good news: I managed to port the vec implementation of pk_gen to go and tested it on mceliece6960119. The KAT test now went from 120s to 17s. Still not as fast as the C implementation but it is much better than the current Go one.

Good job!

Oct 13 '22 09:10 bwesterb

I've ported the vec implementation for all the systematic parameters (parameters that don't end with a f). Right now amd64 and MacOS implementations finish on CI finishes within 3 minutes, but I still can't get it to finish within 10 minutes on arm64.

After porting the vec implementation to semi-systematic parameters it should be able to finish within 10 minutes.

Oct 14 '22 04:10 whyvl

I've ported the vec implementation for all the systematic parameters (parameters that don't end with a f). Right now amd64 and MacOS implementations finish on CI finishes within 3 minutes, but I still can't get it to finish within 10 minutes on arm64.

After porting the vec implementation to semi-systematic parameters it should be able to finish within 10 minutes.

I've now also ported vec for semi-systematic parameters. On arm64 CI it now completes under 10 minutes so it passes CI now.

Oct 14 '22 09:10 whyvl

The round 4 version has been updated, see https://classic.mceliece.org/nist.html

Oct 25 '22 17:10 armfazh

The round 4 version has been updated, see https://classic.mceliece.org/nist.html

I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.

Oct 25 '22 19:10 whyvl

Hello :wave: , just wanted to say this change is desired, and that I hope code review proceeds in the near future! :rocket:

Jan 28 '23 15:01 d-z-m

Hello, just wanted to inquire if there's been any movement on this! Would love to see this get reviewed soon.

Feb 22 '23 03:02 d-z-m

Hi, I will continue with the review of this PR in the next couple days. Thanks for your patience.

Feb 22 '23 06:02 armfazh

Any updates on this one?

Mar 29 '23 19:03 mariobm

I am also really interested in this PR as I am planning to use this for the Go implementation of the WireGuard Rosenpass Key Exchange.

Jun 08 '23 05:06 stv0g

I noticed that the current implementation differs from liboqs with respect to the ciphertext length. See the following note:

https://classic.mceliece.org/mceliece-pc-20221023.pdf

Jun 08 '23 05:06 stv0g

I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.

The file headers are still mentioning round 3:

// Package mceliece348864 implements the IND-CCA2 secure key encapsulation mechanism
// mceliece348864 as submitted to round 3 of the NIST PQC competition and
// described in

Jun 08 '23 06:06 stv0g

@armfazh Can we remove the on-hold label from the issue? Round 4 specs have been released. There is nothing blocking this from getting finished :)

https://classic.mceliece.org/nist.html

Jun 08 '23 06:06 stv0g

I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.

The file headers are still mentioning round 3:
// Package mceliece348864 implements the IND-CCA2 secure key encapsulation mechanism
// mceliece348864 as submitted to round 3 of the NIST PQC competition and
// described in

Thanks for pointing that out, I've fixed that in a new commit.

I noticed that the current implementation differs from liboqs with respect to the ciphertext length. See the following note:

https://classic.mceliece.org/mceliece-pc-20221023.pdf

It would appear that liboqs is still using the round 3 specification which explains the difference in ciphertext length. Mine follows the round 4 specification and the produced ciphertexts are checked against the KAT tests provided by the official Classic McEliece website.

Jun 09 '23 03:06 whyvl

Hi @bwesterb, @pufferffish, @armfazh,

is there something I can do to support you in getting this implementation merged? Like code-review, testing or something similar?

Thanks :)

Aug 01 '23 11:08 stv0g

@stv0g we appreciate a code review, our team will be providing feedback as we get more bandwidth.

Aug 02 '23 19:08 armfazh

@stv0g thanks for the review. It's been a while since I've worked on this and recently I've been busy with work so I can't promise I can get back to your reviews soon. If you'd like to I can give you write access to my fork repo so you can directly work on this pull request.

Aug 04 '23 17:08 whyvl

circl circl copied to clipboard

Implement Classic McEliece

circl
circl copied to clipboard