Unrolling loops with helper functions
When specifying the behaviour of functions that involves a loop, it is sometimes hard to express the post-state on a closed form. A real life example is Pot.rpow, which calculates exponentials of fixed-point integers, or the deposit function of the Eth2.0 registration contract.
Here I propose a syntax which introduces a where construct that I think could handle many such cases. But first, I would like to express the shortcomings of the proposal
In the general case, the result of a loop is difficult to express in a functional language. Especially if it updates storage in unstructured ways. For example, something like
for (i = 0; i < N; i++) {
assembly {
sstore(i,i)
}
}
seems hard to express elegantly in our current setting. However, such contracts are arguably pretty strange anyway.
But in most use cases, when the write-access of a loops aren't determined dynamically, the behaviour of the loop can be expressed as with the help of a recursive, pure function. To allow for such helper functions, I propose to add a new header, where, where they can be defined.
For example, the semantics of:
uint[] zeroHashes;
function zerohashN(uint256 n) {
bytes32 res = "0x0";
for (i = 0; i <= n; i++;) {
res += keccak256(res);
}
zeroHashes[n] = s;
is captured by:
behaviour zeroHashN of Arithmetic
interface zeroHashN(uint256 n)
storage
zeroHashes[n] |-> _ => nth_hash_of(0, n)
where
nth_hash_of(h, n) = if n == 0 then h else hash(nth_hash_of(h, n - 1))
This example came up as I was writing https://github.com/ethereum/smart-contract-spec-lang/blob/bnfc/examples/deposit/deposit-spec.md#deposit
Huge bikeshedding opportunity on using => or = in the function definitions for the where header. VOTE NOW
Is the = operator in the where meaningfully semantically different than the => operator in the above storage statement?
cracks open the door to the bikeshed
@djrtwo they are similar in the sense that the semantics of both is that the LHS rewrites to the RHS, but their difference is akin to that of big step vs. small step semantics: the RHS of a => would be the post state after the entire function call has been processed, whereas the RHS of a where function represents an intermediate state of a calculation necessary to be able to express the former.
They would also be treated somewhat differently by proving engines. I imagine the helper functions defined in the where header would need to be translated into the native language of the prover as independent functions, so that the full step transitions could depend on them
@MrChico how about
behaviour zeroHashN of Arithmetic
interface zeroHashN(uint256 n)
storage
zeroHashes[n] |-> _ => h | nth_hash_of(0, n, h)
where
nth_hash_of(h_in, 0, 0)
nth_hash_of(h_in, n, h_out) <= nth_hash_of(h_in, n - 1, h), hash(h, h_out)
I guess it's just a matter of function vs predicate ;)
But I do find predicates more general for these. You could write a loop invariant predicate that (to me) would be more expressive than a function, and you could still update storage with it.
@MrChico how about
behaviour zeroHashN of Arithmetic interface zeroHashN(uint256 n) storage zeroHashes[n] |-> _ => h | nth_hash_of(0, 0, h) where nth_hash_of(h_in, 0, 0) nth_hash_of(h_in, n, h_out) <= nth_hash_of(h_in, n - 1, h), hash(h, h_out)
Sorry, I don't understand what this means
You can read it as:
nth_hash_ofh_inforn = 0is 0nth_hash_ofh_infor anynish_outifnth_hash_ofh_inn - 1times ish, andhashofhish_out.
Oh, I see. But where do you define these predicates? Are they left uninterpeted or is there some logic programming conventions here that I'm unaware of?
Also, why are they more general? It seems you're just asserting a relation between a functions input and output...
This is the interpretation of the predicate, as long as hash is also defined. Note that the second case does cover all generic inputs, and the base case is sufficient to guarantee termination, so there's no uninterpreted scenario.
I take that back. I don't think it's more general. I just think the syntax is easier to apply (very biased here I'm sure), in the sense that you can capture inductive invariants with a single predicate inv(x, y, z) which is defined later (in where), without having to assign to a storage slot/variable.
rpow example:
function rpow(uint x, uint n, uint b) public pure returns (uint z) {
assembly {
switch x case 0 {switch n case 0 {z := b} default {z := 0}}
default {
switch mod(n, 2) case 0 { z := b } default { z := x }
let half := div(b, 2) // for rounding.
for { n := div(n, 2) } n { n := div(n,2) } {
let xx := mul(x, x)
if iszero(eq(div(xx, x), x)) { revert(0,0) }
let xxRound := add(xx, half)
if lt(xxRound, xx) { revert(0,0) }
x := div(xxRound, b)
if mod(n,2) {
let zx := mul(z, x)
if and(iszero(iszero(x)), iszero(eq(div(zx, x), z))) { revert(0,0) }
let zxRound := add(zx, half)
if lt(zxRound, zx) { revert(0,0) }
z := div(zxRound, b)
}
}
}
}
}
behaviour rpow of Jug
interface rpow(uint x, uint n, uint b)
returns rpow(x, n, b, 0)
where
rpow(x, 0, b, acc) = b
rpow(x, 1, b, acc) = acc
rpow(x, n, b, acc) = rpow(x, n - 1, b, ((acc * x) + b / 2) / b)
After some more discussions, here's some key factors that lead us to tentatively agree on the "functional" format:
- It's probably easier to understand for most of our target audience, assuming that one doesn't go too deep into the swamp of higher abstractions.
- While the logical format can ease the proving of such claims, we want to keep specifications independent of: a) The implementation details of the contracts and b) The methods used to prove the specifications
Both of these properties should aid in the readability of specifications allowing readers to know what the contract does without distracting them with the details of how it is done or how the proof that demonstrates that it actually does it is conducted.
Furthermore, when proving "higher level properties" about the contract (i.e. properties over the set of atomic state transitions (i.e. "big step" properties)), neither a) and b) are relevant, and is disregarded anyway.
We still need to have ways of expressing invariants or lemmas in a comprehensible, back-end independent manner, but the point here is that this shouldn't be in scope for behaviour specs, but rather something that belongs here