developers icon indicating copy to clipboard operation
developers copied to clipboard

geth stop sync failed to notify engine driver of new L2 payload

Open yydsqu opened this issue 1 year ago • 5 comments

t=2024-05-13T05:47:29+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded" t=2024-05-13T05:47:32+0000 lvl=info msg="connected to peer" peer=16Uiu2HAkwiQMsN8zDguThzbff5cyk3L31wg1b8N6EhodFbzgDr1v addr=/ip4/3.95.2.220/tcp/46443 t=2024-05-13T05:47:32+0000 lvl=info msg="Starting P2P sync client event loop" peer=16Uiu2HAkwiQMsN8zDguThzbff5cyk3L31wg1b8N6EhodFbzgDr1v t=2024-05-13T05:47:41+0000 lvl=warn msg="failed to notify engine driver of L1 head change" err="context deadline exceeded" t=2024-05-13T05:47:41+0000 lvl=info msg="connected to peer" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA addr=/ip4/172.208.116.132/tcp/52342 t=2024-05-13T05:47:41+0000 lvl=info msg="Starting P2P sync client event loop" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA t=2024-05-13T05:47:41+0000 lvl=info msg="disconnected from peer" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA addr=/ip4/172.208.116.132/tcp/52342 t=2024-05-13T05:47:45+0000 lvl=info msg="connected to peer" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA addr=/ip4/172.208.116.132/tcp/35842 t=2024-05-13T05:47:45+0000 lvl=info msg="Starting P2P sync client event loop" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA t=2024-05-13T05:47:45+0000 lvl=info msg="disconnected from peer" peer=16Uiu2HAmAqMJngGnV3iJq7jbdfBxA7XqQxGWDnpFfHL77AjsVZXA addr=/ip4/172.208.116.132/tcp/35842 t=2024-05-13T05:47:47+0000 lvl=warn msg="failed to notify engine driver of new L2 payload" err="context deadline exceeded" id=0x7ace6766be008380df7a20582aba5d624a6c01d2b192c3a1050531abf021d477:14394465

yydsqu avatar May 13 '24 05:05 yydsqu

Seeing this more and more these days. Never seen it on optimism but on Base (0.82 and 0.83) and on Blast (both chains are forked from optimism). Some say it's related to forcefully stopping a docker container but I have my doubts about this. Fixing this without resync would be great.

SwatGuard avatar May 22 '24 12:05 SwatGuard

Seeing this on Blast rn Never had this before

sahfd4eiugf avatar Jun 04 '24 14:06 sahfd4eiugf

Same issue on OP mainnet from time to time (3-4 times a day). Same/similar issue discussed here - https://github.com/ethereum-optimism/developers/discussions/263 image

jajaislanina avatar Jun 13 '24 15:06 jajaislanina

You can probably get around this without a resync by rolling back geth and then setting the forkchoice. I look in my graph to see when the node was last healthy, subtract a number of blocks for good measure. For example, here I wanted to roll back to 119000000 by attaching to geth while op-node is turned off:

docker exec -it op-geth geth attach /geth/geth.ipc debug.setHead("0x717CBC0") wait until you see null

It went a little further back, so when I tried setting for forkchoice to 119000000 and got an error about it being greater than the latest. So I went back a little further in the set-forkchoice to a block before what the error said was the latest

docker exec -it op-geth /app/op-w/bin/op-wheel engine set-forkchoice --unsafe=118983000 --safe=118983000 --finalized=118983000 --engine=http://localhost:8551 --engine.jwt-secret=/shared/jwt.txt restart op-node

icculp avatar Jul 04 '24 20:07 icculp

I did not think the above instructions were going to work based on previous attempts to roll back a chain, but I left it running overnight and sure enough I was able to get my node to start syncing again.

My image did not have op-wheel in it, so I had to build it myself, but it worked.

edit: I experienced this on both base and optimism in case anyone is keeping track. All it took was one maintenance reboot to cause it, and it wasn't an unclean shutdown. My containers are configured to allow up to 15 mins to gracefully shutdown.

feld avatar Jul 18 '24 12:07 feld