nimbus-eth2 icon indicating copy to clipboard operation
nimbus-eth2 copied to clipboard

Nil deref in eth1_monitor on mergeForkBlock

Open z3n-chada opened this issue 2 years ago • 5 comments

Describe the bug NilDeref in getPayload from eth1_monitor.nim


nimbus-client-besu-0  | beacon_chain/nimbus_beacon_node.nim(2117) main
nimbus-client-besu-0  | beacon_chain/nimbus_beacon_node.nim(1985) handleStartUpCmd
nimbus-client-besu-0  | beacon_chain/nimbus_beacon_node.nim(1804) doRunBeaconNode
nimbus-client-besu-0  | beacon_chain/nimbus_beacon_node.nim(1597) start
nimbus-client-besu-0  | beacon_chain/nimbus_beacon_node.nim(1541) run
nimbus-client-besu-0  | vendor/nim-chronos/chronos/asyncloop.nim(279) poll
nimbus-client-besu-0  | vendor/nim-chronos/chronos/asyncfutures2.nim(394) internalContinue
nimbus-client-besu-0  | vendor/nim-chronos/chronos/asyncfutures2.nim(365) futureContinue
nimbus-client-besu-0  | beacon_chain/validators/validator_duties.nim(485) getExecutionPayload
nimbus-client-besu-0  | beacon_chain/validators/validator_duties.nim(448) get_execution_payload
nimbus-client-besu-0  | vendor/nim-chronos/chronos/asyncfutures2.nim(365) futureContinue
nimbus-client-besu-0  | beacon_chain/validators/validator_duties.nim(454) get_execution_payload
nimbus-client-besu-0  | beacon_chain/eth1/eth1_monitor.nim(466) getPayload
nimbus-client-besu-0  | vendor/nim-chronos/chronos/asyncfutures2.nim(219) complete
nimbus-client-besu-0  | vendor/nim-chronos/chronos/asyncfutures2.nim(149) cancelled
nimbus-client-besu-0  | vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(610) signalHandler
nimbus-client-besu-0  | SIGSEGV: Illegal storage access. (Attempt to read from nil?)

To Reproduce Steps to reproduce the behavior:

  1. Linux x86
  2. Branch/commit used: kiln-dev-auth
  3. Commands being executed:

 nimbus_beacon_node \
      --non-interactive \
      --data-dir="$NODE_DIR" \
      --log-file="$NODE_DIR/beacon-log.txt" \
      --network="$TESTNET_DIR" \
      --secrets-dir="$NODE_DIR/secrets" \
      --validators-dir="$NODE_DIR/keys" \
      --rpc \
      --rpc-address="0.0.0.0" --rpc-port="$BEACON_RPC_PORT" \
      --rest \
      --rest-address="0.0.0.0" --rest-port="$BEACON_API_PORT" \
      --listen-address="$IP_ADDR" \
      --tcp-port="$CONSENSUS_P2P_PORT" \
      --udp-port="$CONSENSUS_P2P_PORT" \
      --nat="extip:$IP_ADDR" \
      --discv5=true \
      --subscribe-all-subnets \
      --insecure-netkey-password \
      --netkey-file="$NODE_DIR/netkey-file.txt" \
      --graffiti="nimbus-kilnv2:$IP_ADDR" \
      --in-process-validators=true \
      --doppelganger-detection=true \
      --jwt-secret=$JWT_SECRET_FILE --web3-url=ws://127.0.0.1:$EXECUTION_AUTH_WS_PORT \
      --bootstrap-node="$bootnode_enr"
  1. Relevant log lines: '...'

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Has happened twice, randomly, on the bellatrix mergeForkBlock. Have observed with besu and geth, and subsequent runs shows that the crash does not always occur.

z3n-chada avatar Apr 15 '22 04:04 z3n-chada

https://github.com/status-im/nimbus-eth2/pull/3494/commits/6474a90be8860bb630c1002595babacd7e8bd858 guards against this, so it shouldn't outright crash, though it's still unclear how it'd get into a state with nil Eth1Monitor to begin with.

tersec avatar Apr 15 '22 11:04 tersec

Ill rerun on an updated kiln-dev-auth. I believe I was running a version before that commit.

z3n-chada avatar Apr 16 '22 19:04 z3n-chada

Other than trace logs, is there another feature for debugging besides compiling with print statements around the area?

z3n-chada avatar Apr 16 '22 19:04 z3n-chada

https://github.com/status-im/nimbus-eth2/pull/3600

tersec avatar Apr 28 '22 04:04 tersec

Other than trace logs, is there another feature for debugging besides compiling with print statements around the area?

That's what I typically do -- well, rather, add logging as needed. So, for example, I haven't tried to track this down in as much detail yet, but probably would track the eth1monitor back towards where it was supposed to be initialized and determine why it wasn't.

That's a decent aspect of this bug, that it should be something happening on init, so if it happens, it should be visible early.

tersec avatar Apr 28 '22 04:04 tersec

Fixed as of the merge mainnet release through a series of refactorings

arnetheduck avatar Oct 24 '22 10:10 arnetheduck