gossamer icon indicating copy to clipboard operation
gossamer copied to clipboard

devnet node fails to initiate next BABE epoch due key missing from authority set

Open noot opened this issue 4 years ago • 1 comments

Task summary

at around block 10000 on the devnet, the Alice node displayed the following error:

2021-11-27T15:54:55Z INFO epoch 110 complete, upcoming epoch: 111	babe.go:L442	pkg=babe
2021-11-27T15:54:55Z DBUG initiating epoch 111	epoch.go:L18	pkg=babe
2021-11-27T15:54:57Z EROR failed to initiate epoch 111: key not in BABE authority data	babe.go:L351	pkg=babe
2021-11-27T15:54:57Z CRIT block authoring error: key not in BABE authority data	babe.go:L337	pkg=babe

this caused block production to cease for this node (other nodes were ok.)

this should not happen, as the gssmr runtime is set to have the same authority set every epoch (see https://github.com/noot/substrate/commit/f462f52b22ab5b14c08d9895cdc3cba3ca7ca507)

determine why this is happening and fix it. additionally, if BABE fails to initiate an epoch, it should retry at the next epoch instead of completely returning.

noot avatar Nov 29 '21 21:11 noot

I believe this issue is related to #2098 as when I turned off peer banning and ran the devnet I did not see this issue. the peer count going to 0 could potentially cause this due to a missing NextEpochData digest, but I'll need to re-run it to double check that it's definitely related

noot avatar Jan 11 '22 18:01 noot