Field operations for small prime moduli
I'm trying to use fiat-crypto to synthesize (bedrock2) C code for NTT operations as used in some PQ lattice-based cryptography, where the prime modulus fits in one word. How difficult would it be to output "call-by-value" field operations, i.e., without the load/store opening/ending in functions ? I tried looking into how to modify the pipeline, though I have a bit of a hard time understanding, so I would appreciate some help to get started.
The relevant entry-point for the bedrock2 pipeline is https://github.com/mit-plv/fiat-crypto/blob/9bf41bc65f9cfb7263eda84fcfc878fe8da45ff7/src/Bedrock/Field/Stringification/Stringification.v#L200-L202 https://github.com/mit-plv/fiat-crypto/blob/9bf41bc65f9cfb7263eda84fcfc878fe8da45ff7/src/Bedrock/Field/Stringification/Stringification.v#L212-L220. As far as making translate_func avoid load/store when we only have one limb, I think @jadephilipoom or @andres-erbsen is better-placed to answer about the difficulty, since I haven't touched the bedrock2 code much.
There seems to be some documentation in https://github.com/mit-plv/fiat-crypto/blob/master/src/Bedrock/Field/README.md. translate_func itself is implemented at https://github.com/mit-plv/fiat-crypto/blob/9bf41bc65f9cfb7263eda84fcfc878fe8da45ff7/src/Bedrock/Field/Translation/Func.v#L45-L89
Adjusting the wrapper code shouldn't be too hard; it's at https://github.com/mit-plv/fiat-crypto/blob/9bf41bc65f9cfb7263eda84fcfc878fe8da45ff7/src/Bedrock/Field/Stringification/Stringification.v#L116-L147
If you want to play with this all using tactics like cbv and see it interactively, there's some debugging examples in https://github.com/mit-plv/fiat-crypto/blob/master/src/StandaloneDebuggingExamples.v, https://github.com/mit-plv/fiat-crypto/blob/master/src/CompilersTestCases.v, https://github.com/mit-plv/fiat-crypto/blob/master/src/SlowPrimeSynthesisExamples.v
Thank you for the pointers!
I'm not sure I understand whether there is a difference in using the Stringification stuff, vs using the C pretty printer from bedrock?
I was just planning to use the pretty printer, since I will be using Rupicola to synthesize the NTT code after I manage to teach it how to compile arrays of field elements.
To make sure I understand the translation pipeline:
- The files in
PushButtonSynthesis/transform the Gallina code inArithmetic/into someAPI.Expr t(this kinda looks like magic to me for now) - The
API.Expr tstuff is later consumed bytranslate_functo produce a bedrockfunc.
So I would need to modify/copy translate_func so that it avoids the load/stores when the number of limbs is 1, which I tried to do by removing some fluff around the call to translate_func', but this is harder to make Coq typecheck than I expected :/
The existing translate_func should work the way you want if you present the field element as an integer instead of a list; it assumes that lists are "arrays in memory" and that integers are "registers". If I were you, I'd try to approach it by adjusting Arithmetic/ or maybe writing a pass to transform the API.Expr t expression itself. The former is probably easier, since it sounds like you have some new Gallina definitions anyway -- if you make these operate on Z instead of list Z, everything should work as expected.
I think you want to add a reification of encode (?) (the inverse of eval, I believe) a la https://github.com/mit-plv/fiat-crypto/blob/704fe3d9a1a25f28be568d703c352a70f57ac4a0/src/PushButtonSynthesis/UnsaturatedSolinasReificationCache.v#L200-L210 or https://github.com/mit-plv/fiat-crypto/blob/704fe3d9a1a25f28be568d703c352a70f57ac4a0/src/PushButtonSynthesis/UnsaturatedSolinasReificationCache.v#L224-L234 cf also https://github.com/mit-plv/fiat-crypto/blob/704fe3d9a1a25f28be568d703c352a70f57ac4a0/src/PushButtonSynthesis/WordByWordMontgomeryReificationCache.v#L251-L256
Then we want versions of all the operations that compose with encode on the input side and eval on the output side (possibly we want to figure out how to work around https://github.com/coq/coq/issues/7954 so the reification cache isn't two times slower; probably we can do this just by adding reduction at
https://github.com/mit-plv/fiat-crypto/blob/704fe3d9a1a25f28be568d703c352a70f57ac4a0/src/PushButtonSynthesis/WordByWordMontgomeryReificationCache.v#L67-L68 ), and we can emit these operations in, e.g., https://github.com/mit-plv/fiat-crypto/blob/master/src/PushButtonSynthesis/WordByWordMontgomery.v in cases where the number of limbs is 1 (possibly gated by some top-level flag). The bedrock2 code can presumably do something similar.