nimbus-eth2
nimbus-eth2 copied to clipboard
Parse JSON-RPC responses more flexibly
When something is wrong with eth1 monitoring, for example the EL is syncing, we print an unintelligible parse error:
"Eth1 chain monitoring failure, restarting topics="eth1" err="getBlockByHash(m.dataProvider,\n BlockHash(m.depositsChain.finalizedBlockHash.data)) failed 3 times. Last error: Parameter [result] expected JObject but got JNull"".
Our json parsing should be flexible enough to deal with different shapes of responses and give better diagnostics in such cases - in particular, it should be dynamic enough to parse different "shapes" of json error responses, retaining the original message if conversion to object fails (or simply not use automatic to-object conversion and instead use dynamic JsonNode's all the way and extract information using json paths like jq.
I've seen this when setting up Nimbus Beacon Node with an Erigon node.
Nov 28 22:14:40 protoalpha nimbus_beacon_node[63512]: WRN 2022-11-28 22:14:40.692+01:00 Eth1 chain monitoring failure, restarting topics="eth1" err="getBlockByHash(m.dataProvider,\n BlockHash(m.depositsChain.finalizedBlockHash.data)) failed 3 times. Last error: Parameter [result] expected JObject but got JNull" node=goerlinimbus
Hit the same when syncing nimbus (without beacon checkpoint state) + a new geth node.
This happens because Geth does not set the finalized or safe blocks before the first forkchoice update engine API call from nimbus, and it does not default to the genesis block.
And this forkchoice update engine API call does not happen during initial syncing without beacon snapshot state.
A fix for this would be to call the forkchoice update method with the latest forkchoice values after connecting to a new engine API endpoint. And maybe call it for every payload that's inserted during sync (if that's not already the case, can't see it now, because it fails on this problem first).
Edit: the forkchoice update has to be post-merge however, otherwise geth ignores it.