go-ethereum New blocks are rejected because of setHead operation

System information

Geth version: geth version: all versions

Expected behaviour

Whenever SetHead is performed, chain should be rewound to the specific position and import blocks on top smoothly.

Actual behaviour

SetHead may last for a long time, which locks the blockchain in setHeadBeyondRoot function. In the mean time consensus layer will keep feeding us new blocks via engine API. Specifically, in newPayload method, the new provided block will pass all checks(e.g. the parent block is existent, parent state is available, etc) and be blocked at InsertBlockWithoutSetHead which requires the blockchain lock.

When the SetHead is finished, the chain segment above the specified target is all removed, including the parent block of newly arrived payload in engine API. Eventually an ErrUnknownAncestor = errors.New("unknown ancestor") error will be returned which marks the new payload as invalid.

What's more, there is a mechanism in engine API to memorize bad blocks to prevent handling them over and over again. Fortunately there is a time frame which gives the "bad block" another chance after some threshold. Currently the threshold is 128, it means after 128 attempts, the bad block will be gave another chance to import. But it's still too long in this case, node needs to wait for a long time to recover.

Steps to reproduce the behaviour

run debug.SetHead() when the node is already synced.

The idea for fixing this issue can be two directions:

Avoid this error in the first place, new payload should somehow be told that the parent block is not existent and queue it in the future block queue
Relax restrictions on bad blocks for a faster recovery

Feb 15 '23 03:02 rjl493456442

The bigger issue might not be our own bad block cache, rather the bad block cache of the consensus layer. I've noticed that they might resend a block a handful of times (5) and stop sending it afterwards. So they seem to mark it as bad as well

Feb 15 '23 07:02 MariusVanDerWijden

The bigger issue might not be our own bad block cache

True, but that's not within scope of this specific ticket. The scenario described sounds like an error on our part. After a setHead, is it correct to return an ErrUnknownAncestor?

IMO, these two situation are semantically equivalent:

Node A syncs to M, then does setHead to back to N (say two weeks back).
Node B syncs to N, then is shut off. It is restarted two weeks later.

So whatever one of them does, the other should do too.

Feb 15 '23 09:02 holiman

After a setHead, is it correct to return an ErrUnknownAncestor?

Honestly, I think it's a correct behavior. We try to import a future block in this case.

Feb 15 '23 11:02 rjl493456442

Am I not going to get paid? Um yeh I've heard about all the scams people do Frances Reid

On Wed, 15 Feb 2023, 09:54 Martin Holst Swende, @.***> wrote:

The bigger issue might not be our own bad block cache

True, but that's not within scope of this specific ticket. The scenario described sounds like an error on our part. After a setHead, is it correct to return an ErrUnknownAncestor?

IMO, these two situation are semantically equivalent:

Node A syncs to M, then does setHead to back to N (say two weeks back).

Node B syncs to N, then is shut off. It is restarted two weeks later.

So whatever one of them does, the other should do too.

— Reply to this email directly, view it on GitHub https://github.com/ethereum/go-ethereum/issues/26693#issuecomment-1431042032, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYKCHV6DNSVAXXZ6PVOPHFTWXSRV7ANCNFSM6AAAAAAU4KHZEU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Feb 15 '23 14:02 Francesreid

go-ethereum go-ethereum copied to clipboard

New blocks are rejected because of setHead operation

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

go-ethereum
go-ethereum copied to clipboard