protocol-solidity
protocol-solidity copied to clipboard
[TASK] MerkleForest system + circuit
Initial phases, research oriented.
Goal
-
We want to support, for each anchor, many more messages. Right now, with depth 30 trees, we can support 500 million transactions. If we simply scale this horizontally with more trees we can support $N * 500,000,000$ transactions on each chain with a $\log{N}$ increased factor by adding new merkle trees horizontally.
-
Eventually for the identity systems, we want to link potentially 10,000s of groups together. This might be a slightly different variation on this exact system but we will leverage some of the learnings/techniques used here.
Overview
The goal here is to have each upstream anchor system contain many merkle trees or a merkle forest. The new circuit would then be verifying:
- Data exists in one tree using a merkle membership proof. (happening on 1 chain)
- That tree exists in a forest -- a merkle tree accumulated with other merkle roots. (happing on 1 chain)
- Eventually, that that merkle forest exists in another merkle tree (Linked Anchors / Linked Merkle forests on other chains) (happening on all the bridged chains)
Alternative method
Instead of merkle-izing everything, we still use the set-membership proof at the end (the set of merkle roots of other chains) and then we hash this before submitting it to the circuit.
Alternative method 2
Say each colored ball is called masterAnchor
and lives in one of the supported chains.
It holds all necessary merkle-roots inside an array and are sorted by their index.
Another idea is to calculate poseidon(*merkleRoots)
and use the hashes calculated by the different masterAnchor's
on the set membership check
Setting
Consider a bridge with $N$ anchors. Each anchor then stores additionally its $N-1$ neighboring merkle roots. Each anchor stores $K$ merkle trees. This is the setting for all methods.
Method 1 w/ set membership
Method 1's approach as I understand for an anchor $A$ is to:
- Take the array of $K$ merkle trees $[MR_1, MR_2, ..., MR_K]$
- Merkle-ize them into a merkle tree/root $MR_A$
- Relay $MR_A$ to anchors ${B, C, ... }$
Method 2
Method 2's approach as I understand for an anchor $A$ is to:
- Take the array of $K$ merkle trees $[MR_1, MR_2, ..., MR_K]$
- Hash them $cm_A = Poseidon([MR_1, MR_2, ..., MR_K])$
- Relay $cm_A$ to anchors ${B, C, ... }$
Let's now take the full end-to-end flow for a user who wishes to interact with such an anchor.
- A user Alice will deposit a leaf into a merkle tree $MR_i$ (consider as an aside Alice targets anchor $B$ as in an asset application).
- $MR_i$ will update and so on-contract we will need to recalculate $cm_A$ with either 1 Poseidon hash of width $N+1$ or another method such as combining pairs or triples to hash at a time.
- $cm_A$ is relayed to ${B, C, ... }$.
- Alice goes to $B$ and proves knowledge of a leaf in $A$. In order to do so, she must (from the circuit's perspective) a. Get the preimage of her leaf commitment. b. Get the other merkle roots $MR_j \neq MR_i$. c. Compute the hash $cm_A'$ d. Prove set membership of $cm_A'$ in $view_B = {cm_A, cm_B, ...}$
Trade-off
My thoughts are that if the computation of $cm_A$ uses more hashes than $MR_A$ computation than it is not advantageous to use Method 2 over Method 1. At first glance that is where the trade-off comes from but it is possible that it comes into play in other places too. So basically $cm_A$ needs to be less than $log(K)$ hashes (from merkle-tree) in order to be more advantageous at that step.
If everything after this point is just a set membership of some hash in a set of hashes then that should be equivalent, but I might be missing implementation details regarding these differences in circuit.
Method 1 is better than method 2 above.
The remaining question is should we do set membership or should we Merkleize the Merkle forest roots (i.e. $MR_A$, $MR_B$,...) as well.
- Say on each anchor we have $H_A, H_B,...$ which store the historical Merkle forest roots of anchors $A$, $B$,....
- To make a transaction proof, the prover submits an array of Merkle forest roots $MR_A \in H_A$, $MR_B \in H_B$, etc.
- It is then checked using the
isValidRoots
function whether $MR_A$, $MR_B$, etc. are valid roots. - If so, they are then Merkleized to form a Merkle root $MR$. The on-chain Merkleization computational cost is $N * \text{cost of Poseidon Hash function}$.
- $MR$ is submitted as a public input to the ZKP.
- Within the ZKP a Merkle proof is done against $MR$. The number of constraints this adds is $\log(N) * \text{constraints of Poseidon Hash function}$. This is in contrast to the set membership approach which adds $O(N)$ constaints.
So in summary the Merkleize approach:
- Adds a computational cost of $N * \text{cost of Poseidon Hash function}$ for Merkleizing the historical Merkle forest roots on-chain.
- Adds a cost of $\log(N) * \text{constraints of Poseidon Hash function}$ constraints in the ZKP.
The set membership approach:
- Adds a cost of $O(N)$ constraints in the ZKP.
The question is how do these computational costs / ZKP constraints translate to gas costs. This can likely only be figured out by experimental benchmarking.
The gas costs we can assume are roughly fixed for witnesses (they can be as large as we want w/ Groth16, Plonk it scales probably in the size of # of constraints) but scale linearly for public inputs (this should be minimized).
In the set membership approach there is a root array of size $N$, as part of public inputs. So cost scales as $O(N)$.
On the other hand for Merkleization approach there is a $N * \text{cost of Poseidon Hash function}$ extra computational cost which results from needing to Merkleize $MR_A, MR_B,...$ on-chain. So the gas cost for this also scales as $O(N)$.
Consequently, a mere theoretical analysis won't be enough since both are adding $O(N)$ gas costs in different ways. So we need to do empirical studies.