substrate icon indicating copy to clipboard operation
substrate copied to clipboard

MMR Proof Generation is broken

Open acatangiu opened this issue 3 years ago • 6 comments

Is there an existing issue?

  • [X] I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • [X] This is not a support question.

Description of bug

Shortly after deploying polkadot:v0.9.27, Rococo fails to generate MMR proofs using mmr RPC, for example on block 1353786:

curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "mmr_generateProof", "params":[1353785, "0xd5d732c6ce0cb177ae45bbeecaa97fc8e09b2d03aac2c2a594d8ee367b34dea1"]}' https://rococo-rpc.polkadot.io | jq .
{
  "jsonrpc": "2.0",
  "error": {
    "code": 8012,
    "message": "Error while generating the proof",
    "data": "Error::GenerateProof"
  },
  "id": 1
}

Steps to reproduce

  1. Open polkadot.js/apps
  2. Connect to Rococo
  3. Go Developer -> RPC Calls -> mmr -> generateProof
  4. Call it with leafIndex =

It will produce error:

8012: Error while generating the proof: Error::GenerateProof

acatangiu avatar Aug 05 '22 14:08 acatangiu

ckb-merkle-mountain-range-0.3.2 used by pallet-mmr is throwing InconsistentStore error when trying to generate proofs:

2022-08-05 13:50:45.623 ERROR tokio-runtime-worker runtime::mmr: [<wasm:stripped>] MMR error: InconsistentStore

Need to dive deeper for root cause, but likely culprit is recent PR https://github.com/paritytech/substrate/pull/11594

acatangiu avatar Aug 05 '22 14:08 acatangiu

Need to dive deeper for root cause, but likely culprit is recent PR #11594

Maybe I'm missing something, but I don't think that's the cause since the onchain runtime version on rococo is still 9250, which barely precedes inclusion of that PR.

ckb-merkle-mountain-range-0.3.2 used by pallet-mmr is throwing InconsistentStore error when trying to generate proofs

I've tried this for a couple thousand blocks, and the instantiation the InconsistentStore error is consistently this line: https://github.com/nervosnetwork/merkle-mountain-range/blob/master/src/mmr.rs#L125

I've attached a tracelog of a call of mmr_generateProof with params [1394039, 0xdf388aada7be7d84bd5f9834b6480d2cd6afa6c298f12e9ff8fe9754a5c8e5e2]

rococo_generateProof_1394039_0xdf388aada7be7d84bd5f9834b6480d2cd6afa6c298f12e9ff8fe9754a5c8e5e2.log

Lederstrumpf avatar Aug 09 '22 10:08 Lederstrumpf

Running a node from genesis doesn't exhibit this problem, only nodes started from snapshot have this broken db issue.

Unfortunately we only have snapshots from the last 5 days and they are all broken - all current nodes have been semi-recently upgraded and restarted from a broken snapshot, and so we have no older good snapshot.

acatangiu avatar Aug 09 '22 11:08 acatangiu

The problem is that the offchainDB is incomplete, it holds no entries for old leaves (such as leaf indexes 0, 1, 3, 1000000); only for leaves pertaining to blocks added after node was started from snapshot.

This is most likely due to the node being started from an incomplete snapshot. The incomplete snapshot was most likely generated on a node without --enable-offchain-indexing=true parameter, and thus the node had no offchain db entries for MMR leaves.

To fix this, it should be enough to restart nodes from a good snapshot - a snapshot generated from an archive node running with --enable-offchain-indexing=true.

acatangiu avatar Aug 09 '22 13:08 acatangiu

The Rococo sync nodes have been restarted with the correct flags and are currently re-syncing.

acatangiu avatar Aug 09 '22 15:08 acatangiu

Running a node from genesis doesn't exhibit this problem, only nodes started from snapshot have this broken db issue.

Can report that it's working on rococo for me too when syncing from scratch with --enable-offchain-indexing=true enabled from the beginning.

The problem is that the offchainDB is incomplete, it holds no entries for old leaves (such as leaf indexes 0, 1, 3, 1000000); only for leaves pertaining to blocks added after node was started from snapshot.

This is most likely due to the node being started from an incomplete snapshot. The incomplete snapshot was most likely generated on a node without --enable-offchain-indexing=true parameter, and thus the node had no offchain db entries for MMR leaves.

In case we run into this again, I've debugged with a local chain to confirm that the offchainDB can actually be incomplete, so long as all leaves required in the proof are present.

Using a local chain where I:

  • enabled indexing until block 35,
  • disabled it until block 70, and then
  • reenabled until block 268,

i.e. with entries for leaves 36-69 missing in the offchainDB, these are the failure/success scenarios I get:


(feel free to skip remainder of this comment - just keeping a detailed record in case the proof generation breaks again) (here's an archive of the associated chain state: issue-11984-interrupted-indexing-state.tar.gz)


A. leaf_index ≤ 36

For mmr_generateProof called for leaves 0-35, it works as long as the mmr_size is at most 35, since otherwise the leaf's copath to the root contains at least one leaf with index 36-69.

mmr_size ≤ 35 (leaf indexed and full path available)

block_height=35
method="chain_getBlockHash"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'method'", "params":['$block_height']}' 65.108.96.98:9934 | jq .result
"0x642476aaddbbc65dc589cc2d801d8973dc5a394698c9860e17788531ab77dc36"

leaf_index=14;
block_35_hash="0x642476aaddbbc65dc589cc2d801d8973dc5a394698c9860e17788531ab77dc36"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_35_hash'"]}' 65.108.96.98:9934 | jq .result.proof
"0x0e00000000000000230000000000000018b4c9b914a645c24e6d056edf8f905bb6ae80078aadc5b9ea77c2ba4ef45c835afa0360a215717f7b37aae3f96b4d38166bec3ddb412d7e9aa5ae2867783344f1ed33d4c3b7e34902581e2870bb04884b2aabecdc280e73e796aed8480d16391f1d259475eb1383cb48989a575e6ed9c83e5289c9647309217c00b825a2da763a04de585a62641e1ee66573e8e02c6fcad042cc0c9e27e3991ee3b3917cddb5ce239bbe1cc0ab39c0cf00c334a54765faeaeae0740ee531751d432044d7da19d6"

mmr_size > 35 (leaf indexed but full path not available)

block_height=200
method="chain_getBlockHash"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'method'", "params":['$block_height']}' 65.108.96.98:9934 | jq .result
"0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"

leaf_index=14;
block_200_hash="0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_200_hash'"]}' 65.108.96.98:9934 | jq .error
{
  "code": 8012,
  "message": "Error while generating the proof",
  "data": "Error::GenerateProof"
}

B. leaf_index > 35

35 < leaf_index ≤ 127 (leaf indexed but full path not available)

mmr_generateProof always fails since the leaf's copath to the root contains at least one leaf with index 36-69.

leaf_index=126;
block_200_hash="0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_200_hash'"]}' 65.108.96.98:9934 | jq .error
{
  "code": 8012,
  "message": "Error while generating the proof",
  "data": "Error::GenerateProof"
}

leaf_index > 127 (leaf indexed and full path available)

Despite the db being incomplete, mmr_generateProof succeeds since the leftmost peak is now 128, not 32 or 64, so all leaves required on the copath can be found.

leaf_index=128;
block_200_hash="0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_200_hash'"]}' 65.108.96.98:9934 | jq .
{
  "jsonrpc": "2.0",
  "result": {
    "blockHash": "0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209",
    "leaf": "0xc5010080000000a1425cc134f8019e8d8afcab8d1adc0328a90c5d84c359b7f448aa81d924c7c40f0000000000000002000000697ea2a8fe5b03468548a7a413424a6292ab44a82a6f5cc594c3fa7dda7ce4020000000000000000000000000000000000000000000000000000000000000000",
    "proof": "0x8000000000000000c80000000000000020b68f2eecca814fc9eca34ae08f78824a15413275e3511ac22b7f294a95723e356eb85b6931870b476750cc14799a0ff6c2c8d570f2d9953f2256ac117b7813bec36d9f2084db0961c915c75ac61754cbd7e8959e7666d16ffd21a18819f9024f5a11883fdd7cf413a2b7c532e1923432ce5eff73a717792d47d59e366532fe404f3a59caa48cdd3694b0eb73dc73c168702fd67f11efbf159064ca6b522e54da03b22c78eda892112faa3d0b76479b1d82aff960db06576b1d8fd82eae1b461fa8815288087eb7c93e88c1a94fa0400b7081db364240eba8cdd14f0aed2c4f824477fd54800ffd8e8063fc904fc05db631af30c06f1ff8ad54e1e60570bf2f40"
  },
  "id": 1
}

leaf_index=199;
block_200_hash="0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_200_hash'"]}' 65.108.96.98:9934 | jq .
{
  "jsonrpc": "2.0",
  "result": {
    "blockHash": "0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209",
    "leaf": "0xc50100c70000005336e481dc38c2b8e797db169518d9874003cd20a5d27efaa8d28a1d5feb72be160000000000000002000000697ea2a8fe5b03468548a7a413424a6292ab44a82a6f5cc594c3fa7dda7ce4020000000000000000000000000000000000000000000000000000000000000000",
    "proof": "0xc700000000000000c80000000000000014b68f2eecca814fc9eca34ae08f78824a15413275e3511ac22b7f294a95723e352908fa2820c73b598acfc3a27405f24b22434e6f711143d173b7c71e1745ff16aba15c8f91f01c2286946e6a1fe933675d53b33469a104354015b0bca21674fe393d8cdf476e35f19f13136779767cfdc4d66c2f6244706326eb8666f9b233efe93552b2d7c4793d3afe172b3f82a817b15ff535f9af7919572f0346d496bff0"
  },
  "id": 1
}

The equivalent of this last case for the broken rococo snapshot would be that if we started indexing again from the next power of two after the current block height, so 2^21, mmr_generateProof would work for all proofs for blocks after that one again. But we don't have time to test that ;)

Lederstrumpf avatar Aug 10 '22 15:08 Lederstrumpf

Fixed by redeploying RPC nodes using correct snapshot.

MMR Proof generation RPC on Rococo:

curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "mmr_generateProof", "params":[1353785, "0xd5d732c6ce0cb177ae45bbeecaa97fc8e09b2d03aac2c2a594d8ee367b34dea1"]}' https://rococo-rpc.polkadot.io | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1127  100   981  100   146   3483    518 --:--:-- --:--:-- --:--:--  4010
{
  "jsonrpc": "2.0",
  "result": {
    "blockHash": "0xd5d732c6ce0cb177ae45bbeecaa97fc8e09b2d03aac2c2a594d8ee367b34dea1",
    "leaf": "0xc5010039a814005f855fb7851475b63af735b0fb3a9eda6c4cbb6f1c482a1c7d90c034cdbb63e6550900000000000044000000cbc76b9030a4afa18b157d3303ce4ccaf877cb371411e0187de90864ecdb3c263a847d778fa9abf061ad8c734feb2e04f549d15926e6490089903541d7cb0bd2",
    "proof": "0x39a81400000000003aa814000000000024fe578692f7cfefa34f7730676e65134832fed08c7df597583f401e37e4a6bbe91162cf5e4744a77c16f18da1872b62db903e9c67b080cec56960f20a39efe475e1ffc65d44f9095b659ba2d097355a04854f1db91f7861c3884322b798d3bbceeff67024d568f564a1b32cbf62b067f70462615ba39aa453866b8f4ea85a40b06fd88ceb18b50b6772200923bd1908ef0b9ab731157bb56f009d8c0fc9308892a05d055616a7262fd5ffd75aa0a1d23caf8e8c9581e2b011adca4c41f307c991a9452f153b0e63b20cee147dc0355ab3d0a22e0d8b888a8a2c7173b2e44cacb8fa41cc0bc98d1032e9860ae4368e6021a5e63bea28097c6fe230732c27ee26b1a0a64d53f135274681e94b17de116df3438489c46f117e3297975cee394e0a48"
  },
  "id": 1
}

acatangiu avatar Aug 17 '22 12:08 acatangiu