nimbus-eth2
nimbus-eth2 copied to clipboard
Nil deref in eth1_monitor on mergeForkBlock
Describe the bug NilDeref in getPayload from eth1_monitor.nim
nimbus-client-besu-0 | beacon_chain/nimbus_beacon_node.nim(2117) main
nimbus-client-besu-0 | beacon_chain/nimbus_beacon_node.nim(1985) handleStartUpCmd
nimbus-client-besu-0 | beacon_chain/nimbus_beacon_node.nim(1804) doRunBeaconNode
nimbus-client-besu-0 | beacon_chain/nimbus_beacon_node.nim(1597) start
nimbus-client-besu-0 | beacon_chain/nimbus_beacon_node.nim(1541) run
nimbus-client-besu-0 | vendor/nim-chronos/chronos/asyncloop.nim(279) poll
nimbus-client-besu-0 | vendor/nim-chronos/chronos/asyncfutures2.nim(394) internalContinue
nimbus-client-besu-0 | vendor/nim-chronos/chronos/asyncfutures2.nim(365) futureContinue
nimbus-client-besu-0 | beacon_chain/validators/validator_duties.nim(485) getExecutionPayload
nimbus-client-besu-0 | beacon_chain/validators/validator_duties.nim(448) get_execution_payload
nimbus-client-besu-0 | vendor/nim-chronos/chronos/asyncfutures2.nim(365) futureContinue
nimbus-client-besu-0 | beacon_chain/validators/validator_duties.nim(454) get_execution_payload
nimbus-client-besu-0 | beacon_chain/eth1/eth1_monitor.nim(466) getPayload
nimbus-client-besu-0 | vendor/nim-chronos/chronos/asyncfutures2.nim(219) complete
nimbus-client-besu-0 | vendor/nim-chronos/chronos/asyncfutures2.nim(149) cancelled
nimbus-client-besu-0 | vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(610) signalHandler
nimbus-client-besu-0 | SIGSEGV: Illegal storage access. (Attempt to read from nil?)
To Reproduce Steps to reproduce the behavior:
- Linux x86
- Branch/commit used: kiln-dev-auth
- Commands being executed:
nimbus_beacon_node \
--non-interactive \
--data-dir="$NODE_DIR" \
--log-file="$NODE_DIR/beacon-log.txt" \
--network="$TESTNET_DIR" \
--secrets-dir="$NODE_DIR/secrets" \
--validators-dir="$NODE_DIR/keys" \
--rpc \
--rpc-address="0.0.0.0" --rpc-port="$BEACON_RPC_PORT" \
--rest \
--rest-address="0.0.0.0" --rest-port="$BEACON_API_PORT" \
--listen-address="$IP_ADDR" \
--tcp-port="$CONSENSUS_P2P_PORT" \
--udp-port="$CONSENSUS_P2P_PORT" \
--nat="extip:$IP_ADDR" \
--discv5=true \
--subscribe-all-subnets \
--insecure-netkey-password \
--netkey-file="$NODE_DIR/netkey-file.txt" \
--graffiti="nimbus-kilnv2:$IP_ADDR" \
--in-process-validators=true \
--doppelganger-detection=true \
--jwt-secret=$JWT_SECRET_FILE --web3-url=ws://127.0.0.1:$EXECUTION_AUTH_WS_PORT \
--bootstrap-node="$bootnode_enr"
- Relevant log lines: '...'
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Has happened twice, randomly, on the bellatrix mergeForkBlock. Have observed with besu and geth, and subsequent runs shows that the crash does not always occur.
https://github.com/status-im/nimbus-eth2/pull/3494/commits/6474a90be8860bb630c1002595babacd7e8bd858 guards against this, so it shouldn't outright crash, though it's still unclear how it'd get into a state with nil
Eth1Monitor to begin with.
Ill rerun on an updated kiln-dev-auth. I believe I was running a version before that commit.
Other than trace logs, is there another feature for debugging besides compiling with print statements around the area?
https://github.com/status-im/nimbus-eth2/pull/3600
Other than trace logs, is there another feature for debugging besides compiling with print statements around the area?
That's what I typically do -- well, rather, add logging as needed. So, for example, I haven't tried to track this down in as much detail yet, but probably would track the eth1monitor back towards where it was supposed to be initialized and determine why it wasn't.
That's a decent aspect of this bug, that it should be something happening on init, so if it happens, it should be visible early.
Fixed as of the merge mainnet release through a series of refactorings