aztec-packages icon indicating copy to clipboard operation
aztec-packages copied to clipboard

Epic: Pleistarchus

Open LHerskind opened this issue 6 months ago • 0 comments

Pleistarchus is the successor of Leonidas. It will enforce the same logic rules, but take different routes for doing so. Beyond that, he is the "faster" rollup, so other optimisations will fall under this for tracking purposes.

### Directly Sequencer Related
- [ ] https://github.com/AztecProtocol/aztec-packages/issues/8761
- [ ] https://github.com/AztecProtocol/aztec-packages/issues/8762
- [ ] https://github.com/AztecProtocol/aztec-packages/issues/8763
- [ ] https://github.com/AztecProtocol/aztec-packages/issues/8764
- [ ] feat: Efficient data provision
- [ ] https://github.com/AztecProtocol/aztec-packages/issues/8765
- [ ] https://github.com/AztecProtocol/aztec-packages/issues/9184
### Other optimisation tasks
- [ ] #8114

Commit to the committee

In our case, we alter how the validator set and the committees are stored to reduce the overheads.

In Leonidas, a full list of addresses is used to identify the committee in each epoch. This is super convenient but have a gigantic cost as we end up writing to a LOT of storage.

To reduce this cost, one approach would be to compute the full committee (in memory) and then just insert the hash into the Epoch struct. That way, our storage cost is not dependent directly on the number of members in the committee.

However, when we need to use that information, we will then need either to:

  • Provide the committee, and match the hash
  • Compute the committee again and see the matching hash

Before deciding, consider that the swap-or-not shuffle that we use for our sampling is not gas-efficient, it is very costly - doing one index computation is currently (in our unoptimised version) more expensive than an sstore 😬.

Our only real option is to provide the committee. Again, we actually have a few options, but consider looking at kicking it up a notch for the advanced variant.

By providing the data, we need to extend the calldata with 32 bytes for every member of the committee and compute a hash over it. It might be a non-trivial amount, but still much cheaper than computing it again.

Validator Set Snapshots

Then, we run into a new problem: How do we compute the committee after the fact.

As you might know if you have looked at the Leonidas code, the validator set CAN change while the epoch committee is stable. This comes from the fact that we compute the committee and store it for later usage, before we would alter the validator set (which would impact the committee when sampling). This means that if you are trying to compute the committee using a different validator set (even with same seeds) you will end up with a different committee!

To solve this issue, we need to have validator snapshots. Essentially, for every epoch that happened in the past, we want to be able to look up who was in the validator set.

This can be handled in a couple of ways:

  • Deal with it off-chain
  • Have on-chain snapshots

I think for the case of simplicity, we should go for the on-chain snapshots. One approach would be to build merkle trees, but I think we can do much simpler by taking an approach similar to snapshot tokens.

First, we keep track of a list of snapshots for the validator set size.

struct EpochSizeSnapshot {
	uint128 size;
	uint96 epochNumber;
}

EpochSizeSnapshot[] public sizeSnapshots;

Say we have a list of these, and append at the end, to get for the current snapshot, simple read the latest snapshot. If you need to read for an epoch in the past, go back until you find the first epochNumber that is <= to the value you search for (binary search and a hint for efficiency).

Beyond this, say that for every index, we have a list of validator snapshots:

struct ValidatorSnapshot{
	address validator;
	uint96 epochNumber;
}

mapping(uint256 index => ValidatorSnapshot[] snapshots) public validators;

Using this, we have all the data we need!

When computing the committee to setup the epoch we can simply use the last snapshots. And when we need it later, we just take the values from the snapshots that match up.

The latter might be quite expensive, so the good thing is that we don't actually need to do it within a transaction, we can do that fully off-chain, and then we just provide the result (the committee) to our other function later.

Be Optimistic

Even with these things, the initial computation of the committee is untouched and REALLY expensive to do, we are talking 60K gas PER committee member expensive.

However, now we got the necessary tools for being optimistic. By this I mean that we can allow the caller of the setup to directly provide a value that he says is the commitment to the committee along with some bond, and then we can then allow anyone to jump in and show that he is lying and take the funds.

This will require that we can compute the committees earlier than otherwise because we need to provide enough time to challenge it.

An interesting case, a challenger does not actually need to do the full computation on chain (which would be as expensive as before), but he only need to show that at least ONE of the members is incorrect. Meaning that we can get away with providing $n-1$ members as is and then compute just $1$ index 👀 meaning that the challenge would be something around 100K (including 60K for the index computation).


Kick it up a notch

Efficient Data Provision

When providing the members of the committee, we could "abuse" that we will already be providing signatures for at least a subset of them, $\dfrac{2}{3}$. So we could practically provide just that missing $\dfrac{1}{3}$ as addresses, and recover the rest from the signatures. This requires quite a bit of fiddling with input params and encoding so lets safe that for now.

LHerskind avatar Aug 14 '24 14:08 LHerskind