[Bug] Deployment can fail depending on sampled values
🐛 Bug Report
When I try to deploy this program:
program test.aleo;
function address_cast:
input r0 as address.private;
cast r0 into r1 as scalar;
output r1 as scalar.private;
Deployment fails with this message
Failed to eject scalar value: The scalar is greater than or equal to the modulus.
because synthesize_key has sampled an address that cannot be cast to a scalar.
For a demo of this failure, see this repo, and run with RUST_BACKTRACE=full cargo run --release. (It's set up to use snarkvm 3.8.0 but it fails on staging also.)
TL;DR Next steps
We will ignore the problem for now. In the future we can:
- update the circuit, which would not be backwards compatible.
- skip the ejection and console::<SomeType> == circuit<SomeType> checks, which would be a pity because they are nice defence in depth checks
Full problem description
- The function
from_bits_leconverts a slice of (circuit)Booleaninto a (circuit)Scalar. - When deploying a contract, the current codebase populates the circuit with a partially random assignment for key generation (but the corresponding constraints are not enforced).
- In some current tests and use cases, the Aleo instruction
castis used to convert anAddressinto aScalar, which does involvefrom_bits_le. When the codebase ejects the result offrom_bits_le, the actual (circuit)Fieldunderlying theScalaroften doesn’t fit (it does not belong to [0, …, p - 1]), but this is enforced[^1] by Scalar::eject_value. triggering a failure.
Soundness before
The inspected part of the code (mainly from_bits_le) was sound before the PR. Crucially,
- The function asserts the bits beyond
Scalar::size_in_bits()arefalse. - In the simpler case where fewer bits were provided than
Scalar::size_in_bits(), it reconstructs theScalaradding suitable powers of two (as constants) multiplied by the bits (correctly enforcing the in-circuit operations with constraints). - In the subtler case (modified in the PR) where exactly
Scalar::size_in_bits()bits were provided (or more were been provided but have been asserted to befalsein the first point), the circuit reconstructs theScalarwith that many bits and separately constraints the sequence of bits to correspond to at most p - 1. This is enforced securely in-circuit (cf. also here). This rejects witnesses with bits corresponding to (Field) values between p and 2251 - 1.
Changes and soundness after
The code is also sound after the PR (in terms of valid proofs). The changes to are from_bits_le are as follows (all within the block corresponding to the third point above):
- The call
Boolean::is_less_than_or_equal_constantis replaced by the two steps it performs: computing theBoolean“is_less_or_equal"and thenE::assert-ing (i.e. in-circuit constraining) it to betrue. - Using the variable
is_less_or_equal, which the function now has direct access to, a “modified most significant bit” is constructed as the AND of the actual one received by the function andis_less_or_equal. - The
Scalaris reconstructed using the original bits except for the most significant one, where the modified bit is used. Crucially, in every valid assignment, it has the same value as it had before the PR (becauseis_less_or_equalis enforced to betrue), In particular, it is always less than p. However, in the random assignments arising during deployment and key synthesis, theScalaris reduced modulo 2250 due to the introduced AND, and theeject_valuefailure is thus prevented.
Thoughts
-
The R1CS structure constructed by
from_bits_leis different after the changes introduced in the PR, both in terms of variables and constraints: a new R1CS variable appears (of typeBoolean), which is constrained to be [previous-most-significant-bit AND leq-boolean-output], and it is that modified bit which is used as the most significant bit in the reconstruction of theScalar.
This change in R1CS structure also leads to different verification keys, which may be a breaking change and force recomputation of keys or other special handling of previously existing contracts (I am not as well equipped to analyze the practical implications of this for the blockchain). \ -
Ideally, the ejection error could be handled more cleanly without (circuit-related) breaking changes by tweaking key generation (e. g. to avoid the population with dummy witnesses or avoiding out-circuit checks on those values) - but that may have non-trivial implications for the existing infrastructure.
- When sampling dummy field/address inputs during
deployandverify_deployment, we don't sample a random value but we sample '0', which doesn't hit this issue. However, this doesn't resolve e.g. output registers which are invalid Scalars.
- When sampling dummy field/address inputs during
-
We could entirely skip the ejection and console::<SomeType> == circuit<SomeType> checks. Which would be a bit of a pity, because they are nice defense in depth checks of the VM. \
-
If the PR changes are assumed, consideration should be given to whether to implement analogous changes in the console counterpart of the function (but it is possible they are not desirable there). \
-
A decision should be (and perhaps already has been) made on the expected behaviour of casting an
Addressinto aScalarwhen the x coordinate (aField) is at most p. Halting seems natural (and mirrors e. g. the behaviour of Rust with basic numeric types), although other behaviours such as truncation are already present in the codebase - in the case of the linked code, theFieldis (in practical terms) reduced modulo 2250 when converting it into aScalar.