erigon icon indicating copy to clipboard operation
erigon copied to clipboard

[bloatnet] explore initial sync using snapshots for --chain=perf-devnet-2

Open taratorio opened this issue 3 months ago • 9 comments

The EF team doing the state bloat research is planning on doing "initial sync tests" using Syncoor - https://syncoor.perf-devnet-2.ethpandaops.io/#/?directory=perf-devnet-2&network=perf-devnet-2

Currently, Erigon starts off syncing from an old block (the block at which the shadowfork branched off of mainnet at https://etherscan.io/block/22758966 according to Pari) and then has to re-exec all blocks until the tip of perf-devnet-2. This currently takes a very long time since the initial blocks of bloatnet were full 500MGas blocks which create a lot of new state and we are currently inefficient for such a heavy load. We are going to work on optimising this but even once we have it optimised, we will still be re-execing a lot of blocks on initial sync tests for Syncoor. This won't be representative numbers for our sync speed in an environment where the state is x2/x3/x4 the size of mainnet since we aren't leveraging our snapshot sync and are re-execing the blocks.

One thing we can consider doing for this is to setup a snapshotter for perf-devnet-2 and support for running Erigon with --chain=perf-devnet-2 so that initial sync can use bloatnet snapshots.

taratorio avatar Sep 29 '25 13:09 taratorio

yperbasis added Imp2 as Importance

VBulikov avatar Oct 03 '25 10:10 VBulikov

update 01/12/25: we are syncing a new snapshotter for bloatnet based on commit 07e7965c48254d4378316cdc380ff27d36364d6e from performance branch.

No changes on step size were made, goal here is to reach the tip and make easier sync available. Waiting for chain tip.

wmitsuda avatar Dec 01 '25 21:12 wmitsuda

update 08/12: at block 22.89M using --sync.block.loop.limit == 128. Will try to double it to 256 and observe the effects.

wmitsuda avatar Dec 08 '25 17:12 wmitsuda

update 10/12: back to block 22.7M bc deleted chaindata 2 days ago; trying --sync.block.loop.limit == 1024 in order to observe effects on monitoring.

wmitsuda avatar Dec 10 '25 03:12 wmitsuda

it looks like my hypothesis was correct, bigger --sync.block.loop.limit is actually better.

going from 256 -> 1024 increased blocks by 4x, but time taken to exec+commitment increased by only ~2x.

256: 30min exec + 30min commitment 1024: 1h exec + 1h commitment

the tradeoff is losing more uncommited work in case of stop erigon, but here our goal is to reach chaintip ASAP.

let's then increase --sync.block.loop.limit more 4x, 1024 -> 4096 (which is closer to the default of 5000, but let's keep using multiples of previous tests)

Image

wmitsuda avatar Dec 11 '25 20:12 wmitsuda

--sync.block.loop.limit=4096 does not work well with 64GB machine, OOO

[1322923.224586] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[email protected]/app.slice/erigon.service,task=erigon,pid=348048,uid=1000
[1322923.225486] Out of memory: Killed process 348048 (erigon) total-vm:4428816168kB, anon-rss:56179076kB, file-rss:7892kB, shmem-rss:0kB, UID:1000 pgtables:1402632kB oom_score_adj:200
[1322926.093216] oom_reaper: reaped process 348048 (erigon), now anon-rss:0kB, file-rss:340kB, shmem-rss:0kB

let's reduce and experiment with 2048

wmitsuda avatar Dec 13 '25 00:12 wmitsuda

more about previous OOO:

Image

wmitsuda avatar Dec 13 '25 01:12 wmitsuda

at block ~23.73M now (end of 11/2025), it passed the first bloat, there is a second bloat post-devconnect.

Image

so far my conclusions:

  • step size reduction didn't matter afterall. the experiment showed not much reduction in chaindata size because data is already big. The snapshotter is running with no step size changes.
  • the assumption of reducing --sync.loop.block.limit is wrong as well. on the contrary, reducing it make sync times worse as explained above.
  • increasing --sync.loop.block.limit revealed another bottleneck. The maximum I was able to increase it was 1024 for a 64GB mem machine. More than that results in OOM due to (probably, as far as I could track it with pprof) batch of commitments being loaded into memory by TemporalMemBatch.
Image
  • "deleting chaindata" is not effective, as there are few transactions with huge amount of data, so everytime we delete it, the next restart has to recompute a lot of data taking a lot of time again, hence step size reduction being ineffective.
  • best approach was to just "let chaindata growth" and "accept it will be slow, but it'll finish" and "tune --sync.loop.block.limit to avoid OOM"
  • so far, chaindata is > 500GB, but interestingly most of it is reclaimable space. not sure if that says something.

wmitsuda avatar Dec 18 '25 17:12 wmitsuda

at block ~23.73M now (end of 11/2025), it passed the first bloat, there is a second bloat post-devconnect.

Image so far my conclusions:
  • step size reduction didn't matter afterall. the experiment showed not much reduction in chaindata size because data is already big. The snapshotter is running with no step size changes.
  • the assumption of reducing --sync.loop.block.limit is wrong as well. on the contrary, reducing it make sync times worse as explained above.
  • increasing --sync.loop.block.limit revealed another bottleneck. The maximum I was able to increase it was 1024 for a 64GB mem machine. More than that results in OOM due to (probably, as far as I could track it with pprof) batch of commitments being loaded into memory by TemporalMemBatch.
Image * "deleting chaindata" is not effective, as there are few transactions with huge amount of data, so everytime we delete it, the next restart has to recompute a lot of data taking a lot of time again, hence step size reduction being ineffective. * best approach was to just "let chaindata growth" and "accept it will be slow, but it'll finish" and "tune `--sync.loop.block.limit` to avoid OOM" * so far, chaindata is > 500GB, but interestingly most of it is reclaimable space. not sure if that says something.

@wmitsuda you've misunderstood the reason behind reducing --sync.loop.block.limit. It was not to increase sync speed (nobody said that). It was to gather more fine granularity of metrics about table sizes (chart we have in Grafana).

taratorio avatar Dec 19 '25 03:12 taratorio

@wmitsuda you've misunderstood the reason behind reducing --sync.loop.block.limit. It was not to increase sync speed (nobody said that). It was to gather more fine granularity of metrics about table sizes (chart we have in Grafana).

alright, probably was my assumption then

wmitsuda avatar Dec 19 '25 07:12 wmitsuda

for the record, the bootnode ethpandaops is running is showing similar behavior, also 64GB mem machine, stuck around same block, OOM forever.

I recommended --sync.loop.block.limit=1024 to constrain mem usage.

wmitsuda avatar Dec 19 '25 10:12 wmitsuda