nimbus-eth1
nimbus-eth1 copied to clipboard
Excessive memory consumption when syncing a long way up to the `canonical head`
Since PR #3191 the Nimbus EL has an annoying memory problem in the FC module as the syncer does not update base anymore while importing blocks. This happens at least when the syncer has to catch up a long way.
Previously, there was a kludge related to the syncer which used the forkChoice() function for base update.
Now base can only be updated if the CL triggers a forkChoiceUpdated which has no effect if the update is out of scope for the FC module, which in turn happens when syncing for an old or pristine database state.
In fact, this leads to a similar situation to when mainnet was unable to finalise transactions globally.
For the attached screenshot, I ran the syncer overnight (with turned off CL) and had the following memory usage in the morning
- 78.9GiB virtual (from metrics screen)
- 41.4GiB physical (from metrics screen)
- 22GiB extra swap space freed after stopping the process
As it seems, a big machine can handle the situation to an extend but the execution throughput decreases.
@mjfh is #3202 the PR you meant to link?
Ops, was the wrong one -- lol Thanks for noticing
was somehow related to issue 3202 :)
fixed by #3204
Looks like the the problem not cured thoroughly. Need more investigation.
Looks like the the problem not cured thoroughly. Need more investigation.
has the memory usage improved than before or exactly same as before ?
When syncing with hoodi using an empty database, initially everything looks ok, the base can move forward. (apply #3237).
But when the gap is wider, the base stop moving. I'm not sure why CL suddenly request sync from head < 10K, then jump to > 200K+.
The problem is syncer downloading forward from known FC base(even though the segment request is reverse), but FC expecting the syncer to download backward from head.
Of course the FC expectation not satisfied by the syncer because the finalized hash(pendingFCU) not resolved into latestFinalizedNumber.
IIRC from discord discussion we agree on the syncer have two phase:
- Download headers from head to known base, and put it into cache. Probably also start another new session if CL request new target.
- Then download the block or body forward and import into FC.
That is what assume how the syncer works. But looks like not like that.
When syncing with hoodi using an empty database, initially everything looks ok, the base can move forward. (apply #3237).
I observed the same in general although there was an outlier on hoodi when the CL was not fully in sync.
[..] IIRC from discord discussion we agree on the syncer have two phase:
* Download headers from head to known base, and put it into cache. Probably also start another new session if CL request new target. * Then download the block or body forward and import into FC.That is what assume how the syncer works. But looks like not like that.
That is exactly how it works apart from the fact that the CL cannot start a new syncer session while the current one is running.
Sync session 1
base=5324 head=5535 target=8539
download headers 8539..5536
resolved fin = 8468
download bodies 5536..8539
Sync session 2
base=8320 head=8539 target=9227
download headers 9227..8540
resolved fin = 9141
download bodies 8540..9227
Sync session 3
base=#8988 head=#9227 target=#9531
download headers 9531..9228
resolved fin = 9437
download bodies 9228..9531
Sync session 4
base=#9292 head=#9531 target=#259894
download headers 259894..9532
resolved fin = 9894 # <------------- ????????
download bodies 9532..... way past resolved fin, base is not moving anymore during this session lifetime.
EL=nimbus CL=nimbus
Both FC and syncer expect CL give finalized hash near the target, not near the head.
The above sync sessions happen when I sync with hoodi. The question is, why CL send a finalized hash far from target? Considering this fact, the syncer cannot just ignore the finalized block if CL behave like this.
Do you have the actual FCs the CL is sending?
The body download starts with a block number where the header has a parent on the FC module -- no finalised header involved here. In practice, this first block number is often the largest such (unless some RPCs squeezed in.)
This state (that the collected chain has a FC module parent) is signalled by the header cache module.
My take was that the syncer should (and does) neither know nor care about the finalised hash and its block header resolution.
To add, in general, the CLs will send fCUs corresponding to whatever they think the current (head, safe, finalized) EL blocks are. They don't, per, se, have a notion of "target".
To add, in general, the CLs will send fCUs corresponding to whatever they think the current (head, safe, finalized) EL blocks are. They don't, per, se, have a notion of "target".
The name target is used for syncer logging to tell a sort of comprehensive story. It is the local target the syncer attempts to reach.
Yeah, I understand. But in general, in a well-functioning network, the (head, safe, finalized) epochs in fCU are usually (not always) (n, n -1 , n-2).
Is that being seen here?
Here what is being seen:
H=Head, B=Base,F=Finalized
Few early sessions/short session: B......F.H # F is near H
Then CL will send very long session: B..F..............................H # F is near B
During this long session, CL will gradually update F forward with random steps. The steps is small, for example: B=50K, H=270K, F=52K, steps: 27....54
Then around F=77K, the CL stop updating F.
If CL keeps updating F, we can formulate a strategy. But because it stop updating, the excessive memory consumption will always repeat.
don't know how other CL behave.
If you look at the CL logs (e.g., look at the nimbus-eth2 Slot start logs to compare the head and finalized epochs), is F lagging H there too?
is F lagging H there too?
INF 2025-04-28 07:32:24.047+07:00 Slot start topics="beacnde" sync="15h01m (25.97%) 4.0891slots/s (DDPQDQDDDP:77631)/opt - lc: e81c4219:298910" finalized=2424:3abe601d delay=56ms444us575ns slot=298912 epoch=9341 peers=16 head=87c988c0:77642
Looks like CL send finalized hash depends on the progress of CL sync. CL-F epoch=2424, CL-H epoch=9341, progress=25,97%.
If CL sees EL already sync past it's own progress, it will stop sends new H and F.
I want to propose changes to EL syncer: Instead of download headers interleaving with block bodies, we need to separate the syncer into two parts:
- Syncer-H: responsible to downloads block headers backward, but it can start a new session without waiting for block bodies.
- Syncer-B: responsible to downloads block bodies forward after the F resolved. This syncer will download until F, then stop. IF a new F resolved, it will resume download until this new F. repeat this when the distance between H and F > D.
- If distance is small enough, download bodies until H.
The reason for this complication is to keep the CL sends new F without EL progressing too much beyond CL progress percentage.
But there is one problem, should D be calculated dynamically, or it is a constant?
- If calculated, based on what?
- If it is a constant, what is the value?
- Or can we remove this completely, and the CL still thinks we are in sync?
Note:
- There is no changes to what syncer should know. It merely doing what CL told it to do: download blocks.
- But how the F resolved involves both FC and HeaderChainCache(HCC). The header chain stored in database can be modified slightly to also store hash to number every time a new header is stored.
- Should we also integrate FC with HCC, so the only one responsible for resolving F is still FC?
- FC still the one who decide when to move the base forward.
finalized=2424:3abe601d
this is the epoch number, ie 2424*32 = slot 77568 and the head in this log is at 77642 - there is no (significant) gap.
the epoch=9341 in the log is the "wall clock", while head is how far the CL has synced.
then stop
this is not where the issue lies, generally .. ie something else is preventing finalized from being updated. There's no reason for the CL to "hold back" finalized updates, but more broadly, the proposed algorithm wouldn't work when the chain is not finalizing - without finality, the gap between H and F is expected to grow (and we'll solve that by keeping block bodies on disk also for non-final blocks).
It's because of the LC -- the LC gets head but (correctly) doesn't update finalized. This isn't a bug, it's by design. The EL sync should handle it properly.
the epoch=9341 in the log is the "wall clock", while head is how far the CL has synced.
That where the problem is. CL sends FCU to EL:
- H: a block hash from epoch 9341, far from both CL "synced head" and F.
- F: a block hash from epoch 2424, near from CL "synced head".
And this create a huge gap in EL. EL knows nothing about CL "synced head".
The algorithm will works. If there is no finality, the "B-Syncer" will do nothing, it will keep waiting for a valid F from CL.
The algorithm will works. If there is no finality, the "B-Syncer" will do nothing, it will keep waiting for a valid F from CL.
Well, it works within a finalizing network, and also doesn't support syncing using the Nimbus light client. But yes, it avoids memory consumption.
oof, you're right, forgot about the lc stuff :/
btw, now that the head block is stored in the database, memory consumption is not really a problem any more - block bodies can be stored in db for branches as long as they get deleted if no longer needed.
The algorithm will works. If there is no finality, the "B-Syncer" will do nothing, it will keep waiting for a valid F from CL.
then we can't sync the client to start validating which contributes to the non-finality.
~~If we want to participate in a non finalizing network, we need to execute the chain/blocks. If we only download headers and bodies and store them in database, we are in a neutral position, keeping both "valid" and "invalid" branch(depends on who have the bug). And in a neutral position, we contribute nothing to the network except confirming non finalization which is almost nothing.~~
-- oops: the client should in sync state. yeah I got it now
But I think we are still not get it right, the non finalization will still pose a problem if we don't design the FC around that too. Our current design focuses FC around finality. That is only works for finalizing network.
We should shift to evolve evenly around non finalization and finalization to make it right. Argh, I hope there is something in the database to help with this.
Also, it's not only non-finalizing networks, though that is part of it, but also it would recreate the equivalent of
- https://github.com/NethermindEth/nethermind/issues/6338