MMR Proof Generation is broken
Is there an existing issue?
- [X] I have searched the existing issues
Experiencing problems? Have you tried our Stack Exchange first?
- [X] This is not a support question.
Description of bug
Shortly after deploying polkadot:v0.9.27, Rococo fails to generate MMR proofs using mmr RPC, for example on block 1353786:
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "mmr_generateProof", "params":[1353785, "0xd5d732c6ce0cb177ae45bbeecaa97fc8e09b2d03aac2c2a594d8ee367b34dea1"]}' https://rococo-rpc.polkadot.io | jq .
{
"jsonrpc": "2.0",
"error": {
"code": 8012,
"message": "Error while generating the proof",
"data": "Error::GenerateProof"
},
"id": 1
}
Steps to reproduce
- Open polkadot.js/apps
- Connect to Rococo
- Go Developer -> RPC Calls -> mmr -> generateProof
- Call it with
leafIndex=
It will produce error:
8012: Error while generating the proof: Error::GenerateProof
ckb-merkle-mountain-range-0.3.2 used by pallet-mmr is throwing InconsistentStore error when trying to generate proofs:
2022-08-05 13:50:45.623 ERROR tokio-runtime-worker runtime::mmr: [<wasm:stripped>] MMR error: InconsistentStore
Need to dive deeper for root cause, but likely culprit is recent PR https://github.com/paritytech/substrate/pull/11594
Need to dive deeper for root cause, but likely culprit is recent PR #11594
Maybe I'm missing something, but I don't think that's the cause since the onchain runtime version on rococo is still 9250, which barely precedes inclusion of that PR.
ckb-merkle-mountain-range-0.3.2 used by pallet-mmr is throwing InconsistentStore error when trying to generate proofs
I've tried this for a couple thousand blocks, and the instantiation the InconsistentStore error is consistently this line: https://github.com/nervosnetwork/merkle-mountain-range/blob/master/src/mmr.rs#L125
I've attached a tracelog of a call of mmr_generateProof with params [1394039, 0xdf388aada7be7d84bd5f9834b6480d2cd6afa6c298f12e9ff8fe9754a5c8e5e2]
rococo_generateProof_1394039_0xdf388aada7be7d84bd5f9834b6480d2cd6afa6c298f12e9ff8fe9754a5c8e5e2.log
Running a node from genesis doesn't exhibit this problem, only nodes started from snapshot have this broken db issue.
Unfortunately we only have snapshots from the last 5 days and they are all broken - all current nodes have been semi-recently upgraded and restarted from a broken snapshot, and so we have no older good snapshot.
The problem is that the offchainDB is incomplete, it holds no entries for old leaves (such as leaf indexes 0, 1, 3, 1000000); only for leaves pertaining to blocks added after node was started from snapshot.
This is most likely due to the node being started from an incomplete snapshot. The incomplete snapshot was most likely generated on a node without --enable-offchain-indexing=true parameter, and thus the node had no offchain db entries for MMR leaves.
To fix this, it should be enough to restart nodes from a good snapshot - a snapshot generated from an archive node running with --enable-offchain-indexing=true.
The Rococo sync nodes have been restarted with the correct flags and are currently re-syncing.
Running a node from genesis doesn't exhibit this problem, only nodes started from snapshot have this broken db issue.
Can report that it's working on rococo for me too when syncing from scratch with --enable-offchain-indexing=true enabled from the beginning.
The problem is that the offchainDB is incomplete, it holds no entries for old leaves (such as leaf indexes 0, 1, 3, 1000000); only for leaves pertaining to blocks added after node was started from snapshot.
This is most likely due to the node being started from an incomplete snapshot. The incomplete snapshot was most likely generated on a node without --enable-offchain-indexing=true parameter, and thus the node had no offchain db entries for MMR leaves.
In case we run into this again, I've debugged with a local chain to confirm that the offchainDB can actually be incomplete, so long as all leaves required in the proof are present.
Using a local chain where I:
- enabled indexing until block 35,
- disabled it until block 70, and then
- reenabled until block 268,
i.e. with entries for leaves 36-69 missing in the offchainDB, these are the failure/success scenarios I get:
(feel free to skip remainder of this comment - just keeping a detailed record in case the proof generation breaks again) (here's an archive of the associated chain state: issue-11984-interrupted-indexing-state.tar.gz)
A. leaf_index ≤ 36
For mmr_generateProof called for leaves 0-35, it works as long as the mmr_size is at most 35, since otherwise the leaf's copath to the root contains at least one leaf with index 36-69.
mmr_size ≤ 35 (leaf indexed and full path available)
block_height=35
method="chain_getBlockHash"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'method'", "params":['$block_height']}' 65.108.96.98:9934 | jq .result
"0x642476aaddbbc65dc589cc2d801d8973dc5a394698c9860e17788531ab77dc36"
leaf_index=14;
block_35_hash="0x642476aaddbbc65dc589cc2d801d8973dc5a394698c9860e17788531ab77dc36"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_35_hash'"]}' 65.108.96.98:9934 | jq .result.proof
"0x0e00000000000000230000000000000018b4c9b914a645c24e6d056edf8f905bb6ae80078aadc5b9ea77c2ba4ef45c835afa0360a215717f7b37aae3f96b4d38166bec3ddb412d7e9aa5ae2867783344f1ed33d4c3b7e34902581e2870bb04884b2aabecdc280e73e796aed8480d16391f1d259475eb1383cb48989a575e6ed9c83e5289c9647309217c00b825a2da763a04de585a62641e1ee66573e8e02c6fcad042cc0c9e27e3991ee3b3917cddb5ce239bbe1cc0ab39c0cf00c334a54765faeaeae0740ee531751d432044d7da19d6"
mmr_size > 35 (leaf indexed but full path not available)
block_height=200
method="chain_getBlockHash"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'method'", "params":['$block_height']}' 65.108.96.98:9934 | jq .result
"0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
leaf_index=14;
block_200_hash="0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_200_hash'"]}' 65.108.96.98:9934 | jq .error
{
"code": 8012,
"message": "Error while generating the proof",
"data": "Error::GenerateProof"
}
B. leaf_index > 35
35 < leaf_index ≤ 127 (leaf indexed but full path not available)
mmr_generateProof always fails since the leaf's copath to the root contains at least one leaf with index 36-69.
leaf_index=126;
block_200_hash="0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_200_hash'"]}' 65.108.96.98:9934 | jq .error
{
"code": 8012,
"message": "Error while generating the proof",
"data": "Error::GenerateProof"
}
leaf_index > 127 (leaf indexed and full path available)
Despite the db being incomplete, mmr_generateProof succeeds since the leftmost peak is now 128, not 32 or 64, so all leaves required on the copath can be found.
leaf_index=128;
block_200_hash="0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_200_hash'"]}' 65.108.96.98:9934 | jq .
{
"jsonrpc": "2.0",
"result": {
"blockHash": "0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209",
"leaf": "0xc5010080000000a1425cc134f8019e8d8afcab8d1adc0328a90c5d84c359b7f448aa81d924c7c40f0000000000000002000000697ea2a8fe5b03468548a7a413424a6292ab44a82a6f5cc594c3fa7dda7ce4020000000000000000000000000000000000000000000000000000000000000000",
"proof": "0x8000000000000000c80000000000000020b68f2eecca814fc9eca34ae08f78824a15413275e3511ac22b7f294a95723e356eb85b6931870b476750cc14799a0ff6c2c8d570f2d9953f2256ac117b7813bec36d9f2084db0961c915c75ac61754cbd7e8959e7666d16ffd21a18819f9024f5a11883fdd7cf413a2b7c532e1923432ce5eff73a717792d47d59e366532fe404f3a59caa48cdd3694b0eb73dc73c168702fd67f11efbf159064ca6b522e54da03b22c78eda892112faa3d0b76479b1d82aff960db06576b1d8fd82eae1b461fa8815288087eb7c93e88c1a94fa0400b7081db364240eba8cdd14f0aed2c4f824477fd54800ffd8e8063fc904fc05db631af30c06f1ff8ad54e1e60570bf2f40"
},
"id": 1
}
leaf_index=199;
block_200_hash="0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209"
method="mmr_generateProof"
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "'$method'", "params":['$leaf_index', "'$block_200_hash'"]}' 65.108.96.98:9934 | jq .
{
"jsonrpc": "2.0",
"result": {
"blockHash": "0xb85ffec3f302c7b4fc7f6e7b59bdd248afa880da35384e1acf1329bbeb586209",
"leaf": "0xc50100c70000005336e481dc38c2b8e797db169518d9874003cd20a5d27efaa8d28a1d5feb72be160000000000000002000000697ea2a8fe5b03468548a7a413424a6292ab44a82a6f5cc594c3fa7dda7ce4020000000000000000000000000000000000000000000000000000000000000000",
"proof": "0xc700000000000000c80000000000000014b68f2eecca814fc9eca34ae08f78824a15413275e3511ac22b7f294a95723e352908fa2820c73b598acfc3a27405f24b22434e6f711143d173b7c71e1745ff16aba15c8f91f01c2286946e6a1fe933675d53b33469a104354015b0bca21674fe393d8cdf476e35f19f13136779767cfdc4d66c2f6244706326eb8666f9b233efe93552b2d7c4793d3afe172b3f82a817b15ff535f9af7919572f0346d496bff0"
},
"id": 1
}
The equivalent of this last case for the broken rococo snapshot would be that if we started indexing again from the next power of two after the current block height, so 2^21, mmr_generateProof would work for all proofs for blocks after that one again. But we don't have time to test that ;)
Fixed by redeploying RPC nodes using correct snapshot.
MMR Proof generation RPC on Rococo:
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "mmr_generateProof", "params":[1353785, "0xd5d732c6ce0cb177ae45bbeecaa97fc8e09b2d03aac2c2a594d8ee367b34dea1"]}' https://rococo-rpc.polkadot.io | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1127 100 981 100 146 3483 518 --:--:-- --:--:-- --:--:-- 4010
{
"jsonrpc": "2.0",
"result": {
"blockHash": "0xd5d732c6ce0cb177ae45bbeecaa97fc8e09b2d03aac2c2a594d8ee367b34dea1",
"leaf": "0xc5010039a814005f855fb7851475b63af735b0fb3a9eda6c4cbb6f1c482a1c7d90c034cdbb63e6550900000000000044000000cbc76b9030a4afa18b157d3303ce4ccaf877cb371411e0187de90864ecdb3c263a847d778fa9abf061ad8c734feb2e04f549d15926e6490089903541d7cb0bd2",
"proof": "0x39a81400000000003aa814000000000024fe578692f7cfefa34f7730676e65134832fed08c7df597583f401e37e4a6bbe91162cf5e4744a77c16f18da1872b62db903e9c67b080cec56960f20a39efe475e1ffc65d44f9095b659ba2d097355a04854f1db91f7861c3884322b798d3bbceeff67024d568f564a1b32cbf62b067f70462615ba39aa453866b8f4ea85a40b06fd88ceb18b50b6772200923bd1908ef0b9ab731157bb56f009d8c0fc9308892a05d055616a7262fd5ffd75aa0a1d23caf8e8c9581e2b011adca4c41f307c991a9452f153b0e63b20cee147dc0355ab3d0a22e0d8b888a8a2c7173b2e44cacb8fa41cc0bc98d1032e9860ae4368e6021a5e63bea28097c6fe230732c27ee26b1a0a64d53f135274681e94b17de116df3438489c46f117e3297975cee394e0a48"
},
"id": 1
}