substrate Proof verification optimisation for pallet-beefy-mmr

While looking into Ethereum gas optimisation for Snowbridge, we noticed one of OpenZeppelin's optimisations for proof verification. This amounts to sorting each pair of hashes before hashing them during proof creation, which removes the need to track & reproduce the order in which the hash function is applied during proof verification.

In our case, this makes it more efficient to verify proofs on Ethereum because we can perform a single comparison to determine the order when combining hashes.

This would be a breaking change to proof verification, so we'd need to communicate this to the relevant projects.

I've made this change in a fork and am working on updating the tests there. Are there any other considerations or knock-on effects I should deal with before opening a PR?

cc @acatangiu @Lederstrumpf

Dec 01 '22 13:12 doubledup

No knock-on effects I can think of.. feel free to move ahead with the PR.

Dec 21 '22 13:12 acatangiu

Unfortunately this optimization is not resistant to second-preimage attacks. Without this safegaurd it becomes possible for an attacker to shorten or lengthen the proofs (essentially describing an alternative tree) in order to fool the verifier that some non-existent items exist in the original tree.

For single proofs you are able to easily prevent this by asserting that the proof length is the same as the original tree height. This can be done by calculating the height of the tree using the number of leaves at the time the tree was created.

And while this vulnerability only affects the multiproof scheme, in order to truly optimize the gas verification costs for the beefy authority set it's necessary to use a merkle multiproof verifier.

Gas Performance

`@openzeppelin/contracts`

Number of hashes required for single-proof for 667 items in a 1000 leaf tree (kusama validators):

log2(1000) = 10;
667 (leaves) * 10 (tree height) + 667 (leaf hashes) = 7337

`@polytope-labs/solidity-merkle-trees`

Multiproofs allow for the re-use of intermediate hashes when re-calculating the root hash.

number of hashes in a multiproof of 667 leaves in a 1000-leaf tree = 672

Benchmarks

polytope-labs/solidity-merkle-trees

Recommendations

Based on this, we should revert: https://github.com/paritytech/substrate/pull/12857

References

Feb 21 '23 22:02 seunlanlege

Just an update from the Snowfork side, we're mostly debating this issue on the Polkadot Bridges group on element: https://matrix.to/#/#bridges:web3.foundation.

Mar 08 '23 08:03 vgeddes

Given that keccak256 is (still™) collision resistant (else, Ethereum would have bigger fish to fry), a practical exploit of this attack is infeasible, lest an implementation issue such as https://eprint.iacr.org/2023/331 sneaks through. As such, an exploit seems, for the foreseeable future, theoretical in nature.

Nonetheless, @swasilyev pointed me towards a 2nd preimage attack against bitcoin light client SPV proofs, which reduces the brute force requirement from 256 to about 70 bits (modulo economic assumptions re. the attacker plausible at the time) for SPV clients without safety guards. In this case, the bitcoin tx structure (i.e. the leaves in SPV proofs) was critical to the nature of the attack: since transactions can be 64 bytes in Bitcoin, the leaves themselves could be valid internal nodes. In our case where the MMR leaves are 232 bytes, it's only the leaf hashes that have the same length (32 bytes) as node hashes and entertain this ambiguity, so the frontier here's still collision resistance.

If there's another argument to be made that we should install safeguards against 2nd preimage attacks, here are the options to address/mitigate I've tallied up so far:

go ahead with polytope's 2d multiproofs, as proposed by @seunlanlege. What I like about the 2d multiproofs is that the proof verification's algorithm's logic is very transparent. Switching to 2d multiproofs is of independent consideration of 2nd preimage attacks if it benchmarks better than openzeppelin's multiproofs, but should also consider engineering & auditing effort for migrating the frame pallets to this scheme, given that they currently support batch proofs a la nervos already.
check the proof size |leaves| + |proof_items|. This is cheap (assuming one knows the mmr size) and can be done at different depths, depending on the desired mitigation level:
1. upper bound log(mmr_size)
2. upper bound worst case for |leaves|
3. calculate exact proof size from positions of leaves (we don't supply these to proofs currently)
use distinct prefixes when hashing leaves & nodes, as suggested by @AlistairStewart
use distinct hash functions for leaves & nodes, as recommended by openzeppelin itself. The options we have are:
1. ripemd-160 precompile
2. implement blake2b around the blake2b compression precompile
3. sha256 precompile
4. double hash keccak256
Of these alternatives, I'd go for 4.iii or 4.iv, and their gas cost is the same (for hashing a 32 byte input, sha256 would be 60 + 12 vs. keccak @ 2 * (30 + 6)).
check that leaves aren't ever 32 bytes to avoid ambiguity with nodes, also suggested by @AlistairStewart. MMR leaves are (currently) 232 bytes, so this is easily guaranteed.

As such, I'd keep #12857. If there's still concern that 2nd preimage attacks should be guarded against, we can deploy one of the above fixes.

Mar 08 '23 18:03 Lederstrumpf

go ahead with polytope's 2d multiproofs, as proposed by @seunlanlege. What I like about the 2d multiproofs is that the proof verification's algorithm's logic is very transparent. Switching to 2d multiproofs is of independent consideration of 2nd preimage attacks if it benchmarks better than openzeppelin's multiproofs, but should also consider engineering & auditing effort for migrating the frame pallets to this scheme, given that they currently support batch proofs a la nervos already.

I want to add here that the 2d multiproof isn't a new scheme, the previous merkle tree scheme was simply a standard construction with no node sorting. The 2d multiproof is an approach to supporting multiproofs for standard merkle trees.

From the benchmarks i've posted here, the multiproof scheme is more efficient for larger sub sets than single proofs.

I'm still not fully onboard with openzeppelin's multi proofs. But would be interesting to see benchmarks against solidity-merkle-trees.

Mar 09 '23 02:03 seunlanlege

5. check that leaves aren't ever 32 bytes to avoid ambiguity with nodes, also suggested by @AlistairStewart. MMR leaves are (currently) 232 bytes, so this is easily guaranteed.

Just a correction: Should check that leaves aren't ever 64 bytes to avoid ambiguity with nodes.

I can also confirm that @AlistairStewart's recommendations are backed by the OpenZeppelin team in this issue: https://github.com/OpenZeppelin/openzeppelin-contracts/issues/3091, where they suggest the same methods for ensuring domain separation between internal nodes and leafs.

These mitigations are already in place in Polkadot/Substrate

For BEEFY validator root

Leaves are 20-byte ethereum addresses

https://github.com/paritytech/substrate/blob/e46d96cb843b8ae025d2aae8abb46035a6df738c/frame/beefy-mmr/src/lib.rs#L200
https://github.com/paritytech/substrate/blob/e46d96cb843b8ae025d2aae8abb46035a6df738c/frame/beefy-mmr/src/lib.rs#L78
https://github.com/paritytech/substrate/blob/e46d96cb843b8ae025d2aae8abb46035a6df738c/frame/support/src/crypto/ecdsa.rs#L32

For parachain heads root

Leaves are greater than 64 bytes. They are of the form:

(para_id: u32, head_data: Vec<u8>).encode()

And for all parachains in existence, head_data is always greater than 32 bytes. Think parent_head, block_number, state_root, etc. Of course this may change, as I think there is nothing in the protocol that prevents head_data from just containing a 32-byte parent hash, in which case such a leaf could be exploited.

I think a simple solution is to just filter out parachains/parathreads with such a property. The merkle tree construction already does this for parathreads.

https://github.com/paritytech/polkadot/blob/e9bf067c68e806cdb272e0ce8065faa543cc0b5a/runtime/rococo/src/lib.rs#L1292

Mar 09 '23 14:03 vgeddes

Of course this may change, as I think there is nothing in the protocol that prevents head_data from just containing a 32-byte parent hash, in which case such a leaf could be exploited.

I think a simple solution is to just filter out parachains/parathreads with such a property. The merkle tree construction already does this for parathreads.

Just correcting myself, actually there is no vulnerability here. Supposing head_data is 32 bytes, then (para_id, head_data).encode() will actually be 65 bytes in length, due to the compact scale encoding of the vector length.

So in all cases, leaves for the parachain heads root are always equal or greater than 65 bytes in length, and are thus not exploitable via second-preimage attacks.

Mar 09 '23 14:03 vgeddes

Closing since from discussions with @doubledup and also on other Element channels, multiproofs aren't on the table for Snowfork atm anyway, given that via the benchmarks in https://github.com/doubledup/optimisation-beefy-merkle-proofs, Polytope's multiproof of 14 sigs (i.e. the sample size used in production for now) in a tree of 1000 leaves (i.e. current Kusama) is currently performing worse than doing the same serially with singleproofs, and singleproofs are performing sufficiently well as it stands. As such, let's keep the optimization introduced by https://github.com/paritytech/substrate/pull/12857. Using Polytope's or OpenZeppelin style multiproofs should be retabled if, say, the parameters change and we reconsider multiproofs for beefy authorities, or if we end up using multiproofs for verifying parachain headers across multiple blocks.

Apr 11 '23 11:04 Lederstrumpf

@Lederstrumpf @acatangiu Can we revert the optimization in https://github.com/paritytech/substrate/pull/12857? Unfortunately it introduced a security vulnerability, which we only discovered this week. Apologies for the time everyone spent debating this optimization.

The problem is that in our interactive BEEFY light client protocol, we assign BEEFY validators a unique index - the position of their leafs in the validator merkle tree. We record these indices in a bitfield, and manipulate the bitfield as part of our validator subsampling algorithm.

Offchain relayers must provide the leaf index and leaf (validator address), and the light client verifies those using the merkle proof verifier discussed in this issue. However we were so focused on verifying the leaf in an optimized fashion, that we forgot that we need to make sure the leaf index is verified too.

The optimization introduced in https://github.com/paritytech/substrate/pull/12857 prevents us from verifying the leaf index along with the leaf and proof, because it sorts leaves, messing up the leaf order.

May 09 '23 17:05 vgeddes

@Lederstrumpf @acatangiu Can we revert the optimization in https://github.com/paritytech/substrate/pull/12857? Unfortunately it introduced a security vulnerability, which we only discovered this week. Apologies for the time everyone spent debating this optimization.

The problem is that in our interactive BEEFY light client protocol, we assign BEEFY validators a unique index - the position of their leafs in the validator merkle tree. We record these indices in a bitfield, and manipulate the bitfield as part of our validator subsampling algorithm.

Offchain relayers must provide the leaf index and leaf (validator address), and the light client verifies those using the merkle proof verifier discussed in this issue. However we were so focused on verifying the leaf in an optimized fashion, that we forgot that we need to make sure the leaf index is verified too.

The optimization introduced in https://github.com/paritytech/substrate/pull/12857 prevents us from verifying the leaf index along with the leaf and proof, because it sorts leaves, messing up the leaf order.

Of the opinion there should be a more democratic process for core protocol changes like this.

May 09 '23 21:05 seunlanlege

@Lederstrumpf as per our discussion, here's a high-level overview of why leaf indices are important for our validator subsampling protocol, and how our optimization tripped us up.

The offchain relayer scans all Polkadot blocks, and looks for SignedCommitment objects within the justifications for a block.

This SignedCommitment structure contains an ordered list Option<Signature>, indicating which beefy validators signed the commitment or not.

The index of each signature corresponds to the index of the validator address in the beefy validators merkle tree.

Now with this information in hand, relayers are able to participate in an interactive protocol with our beefy light client:

Firstly, the relayer submits an initial bitfield claiming which beefy validators signed a commitment. The positions in the bitfield correspond to the positions of the signatures in the SignedCommitment structure above.
The light client challenges the relayer to provide signatures from a random subset of validators in the initial bitfield. I'll skip over the mechanics of this challenge for now.
The relayer responds to the challenge, providing a list of signature proofs:

    /**
     * @dev The ValidatorProof is a proof used to verify a commitment signature
     * @param v the parity bit to specify the intended solution
     * @param r the x component on the secp256k1 curve
     * @param s the challenge solution
     * @param index index of the validator address in the merkle tree
     * @param addr validator address
     * @param proof merkle proof for the validator
     */
    struct ValidatorProof {
        uint8 v;
        bytes32 r;
        bytes32 s;
        uint256 index;
        address account;
        bytes32[] proof;
    }

The light client verifies those signature proofs. In the function isValidatorSet, we found the security flaw. We're not validating that the supplied leaf index corresponds to the leaf (validator address), while verifying the merkle proof. Without this validation, malicious relayers can subvert our validator subsampling algorithm by specifying fake leaf indices (ie they control which validators are sampled).

May 10 '23 12:05 vgeddes

@vgeddes thanks for the links to the implementation locations. I currently see three options that might either salvage the lexical hash ordering optimisation, or have a distinct performance gain over the unoptimised scheme - feedback & other suggestions welcome: https://hackmd.io/PDs1gnMmSoSVsQfuUj1Piw

That being said, I agree with protecting current launch timeline, and deploying any new optimisations post-launch. So ack from me to revert the lexical hash ordering.

May 18 '23 09:05 Lederstrumpf

Of the opinion there should be a more democratic process for core protocol changes like this.

Agree!

Right now, BEEFY is still experimental and in development, but once we stabilize and write a SPEC, future changes will be more formalized and democratized. For now, I think for a good balance between development velocity and breaking changes discussions/decisions, the expectation for interested parties is to actively follow the topic and speak up.

Even so, I propose we move or at least mirror this conversation to the Polkadot Forum for more visibility.

wdyt?

May 19 '23 08:05 acatangiu

Even so, I propose we move or at least mirror this conversation to the Polkadot Forum for more visibility.

Agreed ser

May 19 '23 09:05 seunlanlege

substrate substrate copied to clipboard

Proof verification optimisation for pallet-beefy-mmr

Gas Performance

@openzeppelin/contracts

@polytope-labs/solidity-merkle-trees

Benchmarks

Recommendations

References

For BEEFY validator root

For parachain heads root

substrate
substrate copied to clipboard

`@openzeppelin/contracts`

`@polytope-labs/solidity-merkle-trees`