secp256k1
secp256k1 copied to clipboard
Replace the 64-bit C field implementation by fiat-crypto output
This PR is result of https://github.com/bitcoin-core/secp256k1/issues/1261 In a nutshell, it replaces the current C-Implementation for the field arithmetic multiply and Square with the proven-correct implementation from the Fiat Cryptography Framework.
Hi, I think I need some help here, too.
the first attempt, in 273bda3 used (as per fiat crypto default) the type __int128, but this fails on some MSVCs
the second attempt, in fc926f4 used "int128.h", but similarly fails (e.g. << not defined on structs).
and the third attempt in baa416f fails similar as well.
Would we need to rewrite the code from Fiat Cryptography to use the secp256k1_u128_mul and its siblings? But then I believe we
- end up with the same code that is currently there
- and we're not guarded by the formal assurances anymore that we would have with the untouched Fiat-C code.
Ok, yes, that is a problem...
Would we need to rewrite the code from Fiat Cryptography to use the
secp256k1_u128_muland its siblings?
Yeah, I think so.
But then I believe we
- end up with the same code that is currently there
Well, that is somewhat expected, no?
- and we're not guarded by the formal assurances anymore that we would have with the untouched Fiat-C code.
The good thing is that our int128 implementation is formally proven correct, too. That means in practice we get meaningful guarantees. However, then things seem to get inelegant, and we need to fiddle a bit. We could translate the code "manually". It's probably easy to inspect, but this defeats the idea of using a code generator, at least somewhat. (We'd need to redo this every time we want to update the Fiat-Crypto code... Most likely that doesn't happen very frequently, though.) The most elegant solution will be changes in fiat-crypto, but that means more work on their side. I commented in https://github.com/mit-plv/fiat-crypto/issues/1560 about this.
Unless I am mistaken, there is no correctness proof for the translation from whatever Fiat Crypto's algorithm representation format is into C code, so mucking about with that translation layer won't lose any formal assurances because there weren't any for that layer to begin with.
Well, that is somewhat expected, no?
Yes, but then I don't see the point adapting in the first place.
I've just read through https://github.com/mit-plv/fiat-crypto/issues/1560, seems like they are aware of this. I also like Andres's three level approach. Either we'd wait for that (which is the most sensible I believe) or instead of replacing the current implementation, we can add the fiat-c and use it if the compiler supports it, and fall back to the existing one if not.
It was my intention on proving the field, then group operations correct after finishing (one of) the modular inverse implementation.
I think performance should probably be the primary concern. However if correctness is the main motivation for this PR, then things get debatable. Each approach to formal correctness comes with a boatload of their own caveats.
P.S. If it makes a difference I could expedite my proof of correctness of the field operations. Certainly this code is far easier to reason about than the modular inverse code, and could be completed much quicker. Still the various caveats remain.
I think we should discuss this in the next IRC meeting on Monday. (I created a "next-meeting" tag for convenience.)
I think performance should probably be the primary concern. However if correctness is the main motivation for this PR, then things get debatable.
Ideally, this PR should not affect performance, as the algorithm should be ours (modulo some low-level C details). So the primary concern here is indeed correctness (if you ask me).
What would improve performance is cryptopt asm, which requires fiat-crypto. In principle, we could add the asm without updating the C code. (We felt it's better to update the C code first, see https://github.com/bitcoin-core/secp256k1/issues/1261, but this could be reconsidered...)
Sure thing. I was really hoping we'd land at a place where we could do both. https://github.com/mit-plv/fiat-crypto/issues/1560#issuecomment-1652241177 mentions some bedrock2 thing which sounds like we might eventually get there, but not today.
I've incorporated the changes:
- moved
u128_add_u128_u128with pointers toint128.hand implementednativeandstruct - moved the wrappers above the autogenerated comment
I've also ran my benchmark suite (secp_bench) on my 10 machines. With Clang and GCC, with defaults and -O3 -mtune=native -march=native
Clang -O3 -mtune=native -march=native
| implementation | default_asm | default_c | fiat_c | fiat_cryptopt |
|---|---|---|---|---|
| ecmult_gen | 15.4785 | 15.1864 | 14.9061 | 13.9127 |
| ecmult_const | 29.3045 | 28.0044 | 27.2884 | 26.4647 |
| ecmult_1p | 23.2979 | 21.9812 | 21.5529 | 20.9758 |
| ecmult_0p_g | 16.9407 | 15.72 | 15.4955 | 15.3359 |
| ecmult_1p_g | 13.6274 | 12.8632 | 12.6361 | 12.2814 |
| field_half | 0.00518161 | 0.00510187 | 0.00493484 | 0.00512773 |
| field_normalize | 0.0047701 | 0.00483506 | 0.0047308 | 0.00475533 |
| field_normalize_weak | 0.00283401 | 0.00282353 | 0.0028121 | 0.00286139 |
| field_sqr | 0.0136218 | 0.0141461 | 0.0138296 | 0.0119747 |
| field_mul | 0.017029 | 0.0167826 | 0.016331 | 0.014328 |
| field_inverse | 1.51311 | 1.5147 | 1.64789 | 1.64148 |
| field_inverse_var | 0.921765 | 0.918659 | 0.924158 | 0.924338 |
| field_is_square_var | 1.21284 | 1.21378 | 1.22373 | 1.22978 |
| field_sqrt | 3.81394 | 3.11671 | 3.79821 | 3.41099 |
GCC -O3 -mtune=native -march=native
| implementation | default_asm | default_c | fiat_c | fiat_cryptopt |
|---|---|---|---|---|
| ecmult_gen | 16.7344 | 16.2269 | 15.7203 | 14.9838 |
| ecmult_const | 31.0237 | 29.2068 | 29.1693 | 27.2251 |
| ecmult_1p | 24.5181 | 23.3218 | 22.5183 | 21.2852 |
| ecmult_0p_g | 17.5664 | 16.801 | 16.2549 | 15.3251 |
| ecmult_1p_g | 14.3227 | 13.5419 | 13.1658 | 12.4306 |
| field_half | 0.00260505 | 0.0024295 | 0.002416 | 0.00248083 |
| field_normalize | 0.00717501 | 0.00694626 | 0.00684726 | 0.0070481 |
| field_normalize_weak | 0.00302477 | 0.00293117 | 0.00286769 | 0.00298777 |
| field_sqr | 0.0143681 | 0.012288 | 0.0163989 | 0.012114 |
| field_mul | 0.0176796 | 0.014202 | 0.0178385 | 0.014767 |
| field_inverse | 1.45613 | 1.44189 | 1.60521 | 1.66405 |
| field_inverse_var | 0.905133 | 0.87939 | 0.867982 | 0.893649 |
| field_is_square_var | 1.26129 | 1.21327 | 1.20329 | 1.23029 |
| field_sqrt | 3.94966 | 3.52407 | 4.29576 | 3.41179 |
Clang default
| implementation | default_asm | default_c | fiat_c | fiat_cryptopt |
|---|---|---|---|---|
| ecmult_gen | 14.5922 | 14.8619 | 15.2704 | 13.0178 |
| ecmult_const | 28.9864 | 29.0937 | 30.0857 | 25.7223 |
| ecmult_1p | 22.7649 | 22.5667 | 23.2965 | 20.3805 |
| ecmult_0p_g | 16.5215 | 16.1956 | 16.8933 | 14.6744 |
| ecmult_1p_g | 13.3438 | 13.1968 | 13.6445 | 11.9628 |
| field_half | 0.00268292 | 0.00246003 | 0.00239556 | 0.00242083 |
| field_normalize | 0.00700584 | 0.00699441 | 0.006914 | 0.00695979 |
| field_normalize_weak | 0.00282523 | 0.00282798 | 0.00280225 | 0.00281053 |
| field_sqr | 0.0137319 | 0.0128169 | 0.0146826 | 0.0119927 |
| field_mul | 0.0171099 | 0.0166114 | 0.017404 | 0.0142171 |
| field_inverse | 1.44857 | 1.44306 | 1.61719 | 1.61457 |
| field_inverse_var | 0.965295 | 0.963569 | 0.957785 | 0.960852 |
| field_is_square_var | 1.26972 | 1.2628 | 1.26665 | 1.26727 |
| field_sqrt | 3.77267 | 3.51891 | 4.02258 | 3.33573 |
GCC with default
| implementation | default_asm | default_c | fiat_c | fiat_cryptopt |
|---|---|---|---|---|
| ecmult_gen | 15.9837 | 14.9299 | 14.9616 | 14.4035 |
| ecmult_const | 30.1929 | 28.1791 | 28.0992 | 26.7815 |
| ecmult_1p | 23.8301 | 21.9197 | 21.971 | 20.8491 |
| ecmult_0p_g | 16.8746 | 15.8356 | 15.8409 | 14.9011 |
| ecmult_1p_g | 13.9126 | 12.8102 | 12.8655 | 12.1725 |
| field_half | 0.00263405 | 0.00241618 | 0.00236633 | 0.00249735 |
| field_normalize | 0.00737182 | 0.00743337 | 0.00731607 | 0.00734032 |
| field_normalize_weak | 0.00293929 | 0.0029294 | 0.00290844 | 0.00289157 |
| field_sqr | 0.0140036 | 0.0125453 | 0.0122373 | 0.0120457 |
| field_mul | 0.0169176 | 0.0143035 | 0.0148859 | 0.0143003 |
| field_inverse | 1.39951 | 1.39713 | 1.61788 | 1.61036 |
| field_inverse_var | 0.911792 | 0.912018 | 0.908337 | 0.907321 |
| field_is_square_var | 1.19553 | 1.20238 | 1.18607 | 1.19135 |
| field_sqrt | 3.86876 | 3.56773 | 3.46217 | 3.36305 |
I think performance should probably be the primary concern. However if correctness is the main motivation for this PR, then things get debatable. Each approach to formal correctness comes with a boatload of their own caveats.
I just want to let you know that I'd be more than happy to explain the good/bad/ugly of what we fiat-crypto proofs are about in detail, if this would be of interest to the maintainers. Evaluating verification techniques can indeed be subtle, and I would try to be as unbiased of a resource as I can.
In short, fiat-crypto proofs about field arithmetic cover limb representation and modular-reduction algorithm, correct determination of bit-sizes of intermediate values (no unexpected overflow / lost carries), and low-level arithmetic optimization. The last verified format of code fiat-crypto-to-C translation is arithmetic operators with truncating casts r encoding desired value ranges of C types to be used for intermediates, as they appear on the left-hand sides of this match statement. Translating word arithmetic in fiat-crypto to word arithmetic in C is not covered by the proofs, and loading of the inputs and storing of the outputs isn't either. The latter is easy to inspect, but the former can be subtle due to integer promotion and other type-based arithmetic. These aspects are modeled in fiat cryptography, and the C-code generation explicitly considers that model, but in the end we are taking on faith that clang and gcc follow the same rules as encoded in our model. This same assumption would be present even if the last translation was verified against a semantics of C, as that semantics would refer to the same (or very similar) arithmetic rules. Every seriously-exercised formal model of C I have used or built has had corner cases where it didn't cover the behavior of some popular C compiler. That said, larger-than-int unsigned-integer arithmetic is as simple as C semantics get. For reviewing the generated code, it may also be helpful to consider how the same arithmetic was expressed in a different language such as Zig.
The main goal of the ongoing fiat-crypto-bedrock2 integration is to enable computer-checked integration proofs of generated field-arithmetic code, underlying polyfills, elliptic-curve algorithms, and application code above. In other words, we are looking to ensure that one proven layer of the cryptographic implementation does not make incompatible assumptions about another layer; we are working to avoid broken integration of individually proven components. Having a slightly lower-level language between fiat-crypto and C is a side benefit from our perspective. Similarly, pretty-printing (in bedrock2) and parsing (in CompCert before VST) do not represent a significant difference in our key considerations for semantic correctness; we chose pretty-printing to alleviate the user-experience concerns around ruling out common but poorly understood features of C that would be hard to model soundly. (The status of this work is that we have an x25519 implementation and some more proven and under submission, but not all performance-hurting shortcuts have been removed yet.)
If anyone wants to look at the compiler output of e3affa143f510e45d1e23f71c0b8c11c030680ac vs master: https://godbolt.org/z/fPTafdqd8
Here are my notes on the topic of caveats of formal proofs:
Regarding using VST
-
The C-lightgen program to convert C to Coq is proprietary and requires a license to use. So this step of the proof cannot easily be independently checked by the community. At best we can distribute the output of C-lightgen.
- Currently Blockstream is paying to use C-lightgen for Simplicity, and thus only those aspects of libsecp256k1 (the validation side of the library) are available for us to run C-lightgen on.
-
VST only proves the correctness with respect to the (non-free) CompCert compiler and makes no guarantees about GCC or Clang or any other compiler.
- Furthermore each VST proof only applies to one of either the 32-bit or 64-bit versions of Compcert.
- VST does provide a best-effort job to disallow signed integer overflow, but makes no formal guarantees about whether it achieves this or not (e.g. see No check for signed overflow in some cases. · Issue #561 · PrincetonUniversity/VST)
-
VST specifications do not cover termination. In other words, only partial correctness is verified.
-
VST is restricted to a subset of C that doesn’t use structures directly as values, i.e. no assignment or passing by copy to functions or returning them from functions.
- Other restrictions also apply, such as no Duff's Device and no longjmps.
-
To first-order approximation, I would end up as the sole maintainer of any VST proofs.
Regarding using Fiat-Crypto
I’m less familiar with Fiat-Crypto, so the following caveats are only to the best of my knowledge. @andres-erbsen has written more on this topic above.
-
The generated C code is translated from a proven correct algorithm (that terminates) but makes no formal guarantees about the resulting C code at all, no matter what compiler it is compiled on.
-
The proofs cannot be extended beyond what Fiat-Crypto generates, whereas VST can, in principle, reason about the full libsecp256k1 library in a fully integrated manner (no need to translate specifications between various proof systems.)
Miscellaneous
Neither system makes any guarantees about constant-timedness, and arguably it is impossible to make any such statements about C code anyways.
Ideally we’d have Fiat-Crypto generate C code that operates within VST’s limitations, however that seem to require an unknown amount of work on the Fiat-Crypto side, with some hints that sometime in the future it might be doable with bedrock2.
If we want to go on beyond the field code and prove the group code correct, or the multi-exponentiation with VST, we either need to machine generate the specifications of the field code functions, or generate the specifications by hand. Either way, the correctness of the VST proof of the code would only be correct upto the correctness of the specifications of any non-VST compatible Fiat-Crypto generated code. Whereas if everything were done in VST, then we would have correctness all the way to the bottom (subject to VST’s own caveats).
@roconnor-blockstream thank you for the overview. FWIW, I think we are trying to communicate very similar perspectives about current state of the tooling, though with different words.
As for grand plans for future proofs, I would like to discuss them more, but perhaps outside this thread? I think we are pursuing quite similar directions and one would hope that we would be able to do something to minimize duplication of effort. In particular, ongoing work is also proving elliptic-curve code and its use cases on top of fiat-crypto using bedrock2 (and we are connecting the proofs in Coq using the fiat-crypto field-arithmetic specifications). Instead of proving the same code twice using different tools, I am guessing that our time would be better spent increasing proof coverage throughout the codebase, connecting between different toolchains informally at simple interfaces as needed. I am saying this partly because I believe bedrock2 C is even further semantically diverged from CompCert C (details in §2.4;2.5) than the fiat-crypto output here, but also I think it's just a good idea regardless. Or perhaps there's an even better strategy, but more coordination seems desirable either way.
Ooph! Sounds like hypothetical bedrock2 generated C would end up containing a lot of uintptr_t to pointer casts. I wonder how libsecp256k1 maintainers would feel about that? Sure technically it is all fine ... but for some reason I feel uneasy about it.
But yeah, I guess the hypothetical bedrock2 generated C is likely even worse with respect to also facilitating VST proofs.
Yeah, I'm bitter about the uintptr_t casts too; you can see some checked-in output here. I'd love to have neater output, but it is unclear whether it would be even feasible to accurately formalize the cases of pointer-manipulation semantics where standard C threatens undefined behavior even though VST & CompCert assume that something reasonable happens.
Do you think https://github.com/bitcoin-core/secp256k1/issues/181 would be a good place to continue this discussion, or shall we create a new issue?
Do you think #181 would be a good place to continue this discussion, or shall we create a new issue?
Sure, that works. (I'd love to have the ability to move comments/threads into other issues...)
I've finished the correctness proofs in VST of secp256k1_fe_mul_inner and secp256k1_fe_sqr_inner. You can find the rendered proof at https://htmlpreview.github.io/?https://github.com/ElementsProject/simplicity/blob/verif_field_5x52_int128_impl_0/alectryon/verif_field_5x52_int128_impl.v.html.
The proof itself is probably not all that interesting beyond the fact of its existence. You can step through it if you like and watch the Hoare clauses evolve as it is progresses through the C implementation. The proof also covers all the VERIFY_CHECK statements. I've also taken the liberty of applying #1438 and #1442, presuming that they will be merged, though the proof script itself works even without these PR applied.
The more relevant bit is the formal specification of the functions that the proofs relate to. These are secp256k1_fe_mul_inner_spec and secp256k1_fe_sqr_inner_spec which have rendered versions at https://htmlpreview.github.io/?https://github.com/ElementsProject/simplicity/blob/verif_field_5x52_int128_impl_0/alectryon/spec_field_5x52.v.html.
These specifications are written in the formal separation logic language of VST. These specification are made somewhat complicated by needing to support the fact that the r parameter may or may not alias the a parameter. To that end, I've written "sub-specifications" secp256k1_fe_mul_inner_spec_restrict and secp256k1_fe_mul_inner_spec_alias which provided somewhat simpler specification for the specific cases where r does not/does alias a. These sub-specifications can be found in the same file, along with the proofs that these are indeed sub-specifications of the formal specification.
I'm happy to go over these specification in this thread here, or perhaps at the next IRC meeting, or more generally discuss the future of this PR.
The last meeting where this topic was discussed was https://gnusha.org/secp256k1/2023-07-31.log, and the formalization of these two functions is me addressing the comment
< real_or_random> I mean it would certainly be interesting to compare the VST efforts for the field code to fiat
which we can now do.
It's been 17 weeks since the last meeting, which means I worked a little bit less than 17 days on the effort. Let's call it three weeks of work. (Edit: I didn't start working on it right away, so maybe closer to two weeks of work.)
As for this PR; my personal preference is to close it. With this VST proof in hand we have comparable assurance to what the fiat-crypto code gives us, but in a way that can let us proof the (functional) correctness of subsequent functions. At some point in time, if VST ever supports passing/returning structs, or fiat-crypto is able to generate VST compatible C code, or we get some other formal system able to practically reason about arbitrary C code, we can revisit the inclusion of the fiat-crypto code.
However, whether or not to close this PR is certainly debatable, and we can discuss the issue here and/or at the next IRC meeting. I am not the decider.
However, whether or not to close this PR is certainly debatable, and we can discuss the issue here and/or at the next IRC meeting.
I think that's a good idea.