snarkVM icon indicating copy to clipboard operation
snarkVM copied to clipboard

[Bug] Deployment can fail depending on sampled values

Open mikebenfield opened this issue 5 months ago • 1 comments

🐛 Bug Report

When I try to deploy this program:

program test.aleo;

function address_cast:
    input r0 as address.private;
    cast r0 into r1 as scalar;
    output r1 as scalar.private;

Deployment fails with this message

Failed to eject scalar value: The scalar is greater than or equal to the modulus.

because synthesize_key has sampled an address that cannot be cast to a scalar.

For a demo of this failure, see this repo, and run with RUST_BACKTRACE=full cargo run --release. (It's set up to use snarkvm 3.8.0 but it fails on staging also.)

mikebenfield avatar Jul 11 '25 00:07 mikebenfield

TL;DR Next steps

We will ignore the problem for now. In the future we can:

  • update the circuit, which would not be backwards compatible.
  • skip the ejection and console::<SomeType> == circuit<SomeType> checks, which would be a pity because they are nice defence in depth checks

Full problem description

  • The function from_bits_le converts a slice of (circuit) Boolean into a (circuit) Scalar.
  • When deploying a contract, the current codebase populates the circuit with a partially random assignment for key generation (but the corresponding constraints are not enforced).
  • In some current tests and use cases, the Aleo instruction cast is used to convert an Address into a Scalar, which does involve from_bits_le. When the codebase ejects the result of from_bits_le, the actual (circuit) Field underlying the Scalar often doesn’t fit (it does not belong to [0, …, p - 1]), but this is enforced[^1] by Scalar::eject_value. triggering a failure.

Soundness before

The inspected part of the code (mainly from_bits_le) was sound before the PR. Crucially,

  • The function asserts the bits beyond Scalar::size_in_bits() are false.
  • In the simpler case where fewer bits were provided than Scalar::size_in_bits(), it reconstructs the Scalar adding suitable powers of two (as constants) multiplied by the bits (correctly enforcing the in-circuit operations with constraints).
  • In the subtler case (modified in the PR) where exactly Scalar::size_in_bits() bits were provided (or more were been provided but have been asserted to be false in the first point), the circuit reconstructs the Scalar with that many bits and separately constraints the sequence of bits to correspond to at most p - 1. This is enforced securely in-circuit (cf. also here). This rejects witnesses with bits corresponding to (Field) values between p and 2251 - 1.

Changes and soundness after

The code is also sound after the PR (in terms of valid proofs). The changes to are from_bits_le are as follows (all within the block corresponding to the third point above):

  • The call Boolean::is_less_than_or_equal_constant is replaced by the two steps it performs: computing the Booleanis_less_or_equal" and then E::assert-ing (i.e. in-circuit constraining) it to be true.
  • Using the variable is_less_or_equal, which the function now has direct access to, a “modified most significant bit” is constructed as the AND of the actual one received by the function and is_less_or_equal.
  • The Scalar is reconstructed using the original bits except for the most significant one, where the modified bit is used. Crucially, in every valid assignment, it has the same value as it had before the PR (because is_less_or_equal is enforced to be true), In particular, it is always less than p. However, in the random assignments arising during deployment and key synthesis, the Scalar is reduced modulo 2250 due to the introduced AND, and the eject_value failure is thus prevented.

Thoughts

  • The R1CS structure constructed by from_bits_le is different after the changes introduced in the PR, both in terms of variables and constraints: a new R1CS variable appears (of type Boolean), which is constrained to be [previous-most-significant-bit AND leq-boolean-output], and it is that modified bit which is used as the most significant bit in the reconstruction of the Scalar.

    This change in R1CS structure also leads to different verification keys, which may be a breaking change and force recomputation of keys or other special handling of previously existing contracts (I am not as well equipped to analyze the practical implications of this for the blockchain). \

  • Ideally, the ejection error could be handled more cleanly without (circuit-related) breaking changes by tweaking key generation (e. g. to avoid the population with dummy witnesses or avoiding out-circuit checks on those values) - but that may have non-trivial implications for the existing infrastructure.

    • When sampling dummy field/address inputs during deploy and verify_deployment, we don't sample a random value but we sample '0', which doesn't hit this issue. However, this doesn't resolve e.g. output registers which are invalid Scalars.
  • We could entirely skip the ejection and console::<SomeType> == circuit<SomeType> checks. Which would be a bit of a pity, because they are nice defense in depth checks of the VM. \

  • If the PR changes are assumed, consideration should be given to whether to implement analogous changes in the console counterpart of the function (but it is possible they are not desirable there). \

  • A decision should be (and perhaps already has been) made on the expected behaviour of casting an Address into a Scalar when the x coordinate (a Field) is at most p. Halting seems natural (and mirrors e. g. the behaviour of Rust with basic numeric types), although other behaviours such as truncation are already present in the codebase - in the case of the linked code, the Field is (in practical terms) reduced modulo 2250 when converting it into a Scalar.

vicsn avatar Aug 12 '25 12:08 vicsn