geth stop sync failed to notify engine driver of new L2 payload
t=2024-05-13T05:47:29+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded" t=2024-05-13T05:47:32+0000 lvl=info msg="connected to peer" peer=16Uiu2HAkwiQMsN8zDguThzbff5cyk3L31wg1b8N6EhodFbzgDr1v addr=/ip4/3.95.2.220/tcp/46443 t=2024-05-13T05:47:32+0000 lvl=info msg="Starting P2P sync client event loop" peer=16Uiu2HAkwiQMsN8zDguThzbff5cyk3L31wg1b8N6EhodFbzgDr1v t=2024-05-13T05:47:41+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded" t=2024-05-13T05:47:41+0000 lvl=info msg="connected to peer" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA addr=/ip4/172.208.116.132/tcp/52342 t=2024-05-13T05:47:41+0000 lvl=info msg="Starting P2P sync client event loop" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA t=2024-05-13T05:47:41+0000 lvl=info msg="disconnected from peer" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA addr=/ip4/172.208.116.132/tcp/52342 t=2024-05-13T05:47:45+0000 lvl=info msg="connected to peer" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA addr=/ip4/172.208.116.132/tcp/35842 t=2024-05-13T05:47:45+0000 lvl=info msg="Starting P2P sync client event loop" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA t=2024-05-13T05:47:45+0000 lvl=info msg="disconnected from peer" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA addr=/ip4/172.208.116.132/tcp/35842 t=2024-05-13T05:47:47+0000 lvl=warn msg="failed to notify engine driver of new L2 payload" err="context deadline exceeded" id=0x7ace6766be008380df7a20582aba5d624a6c01d2b192c3a1050531abf021d477:14394465
Seeing this more and more these days. Never seen it on optimism but on Base (0.82 and 0.83) and on Blast (both chains are forked from optimism). Some say it's related to forcefully stopping a docker container but I have my doubts about this. Fixing this without resync would be great.
Seeing this on Blast rn Never had this before
Same issue on OP mainnet from time to time (3-4 times a day).
Same/similar issue discussed here - https://github.com/ethereum-optimism/developers/discussions/263
You can probably get around this without a resync by rolling back geth and then setting the forkchoice. I look in my graph to see when the node was last healthy, subtract a number of blocks for good measure. For example, here I wanted to roll back to 119000000 by attaching to geth while op-node is turned off:
docker exec -it op-geth geth attach /geth/geth.ipc
debug.setHead("0x717CBC0")
wait until you see null
It went a little further back, so when I tried setting for forkchoice to 119000000 and got an error about it being greater than the latest. So I went back a little further in the set-forkchoice to a block before what the error said was the latest
docker exec -it op-geth /app/op-w/bin/op-wheel engine set-forkchoice --unsafe=118983000 --safe=118983000 --finalized=118983000 --engine=http://localhost:8551 --engine.jwt-secret=/shared/jwt.txt
restart op-node
I did not think the above instructions were going to work based on previous attempts to roll back a chain, but I left it running overnight and sure enough I was able to get my node to start syncing again.
My image did not have op-wheel in it, so I had to build it myself, but it worked.
edit: I experienced this on both base and optimism in case anyone is keeping track. All it took was one maintenance reboot to cause it, and it wasn't an unclean shutdown. My containers are configured to allow up to 15 mins to gracefully shutdown.