lodestar
lodestar copied to clipboard
Performance issue regen state
Describe the bug
Reload state + regen takes ~13s on mainnet
Received gossip block should be logged before the validation time, it showed 13.2s as validation time
126079:Jul-30 01:41:31.895[network] debug: Received gossip block slot=9623304, root=0x0954…7537, curentSlot=9623305, peerId=16Uiu2HAmNCh2yBozfk5dujuWpUWCXzQdXbUK3VoVcHg1S8ZRW4Rt, delaySec=7.585000038146973, pending=null, haveBlobs=6, expectedBlobs=6, recvToValLatency=0.07299995422363281, recvToValidation=13.309999942779541, validationTime=13.236999988555908
it mean the validation started at ~01:41:18.658
2s was for reloading state
125805:Jul-30 01:41:19.084[chain] debug: Reload: read state successful persistedKey=0xb796040000000000d572d4cf29fadad5ee3bd7fcfe2d9e8ab762d44d3fb58d36ac06682b36b34a42
125822:Jul-30 01:41:19.084[chain] verbose: Reload: use block state as seed state slot=9623302
125823:Jul-30 01:41:19.084[chain] debug: Reload: found seed state persistedKey=0xb796040000000000d572d4cf29fadad5ee3bd7fcfe2d9e8ab762d44d3fb58d36ac06682b36b34a42, seedSlot=9623302
125824:Jul-30 01:41:20.945[chain] debug: Reload: cached state load successful persistedKey=0xb796040000000000d572d4cf29fadad5ee3bd7fcfe2d9e8ab762d44d3fb58d36ac06682b36b34a42, stateSlot=9623264, stateRoot=0xcdbc3aa5ffa8e44273870bf29cb4686bddcb4448d7b12f715a77d2b8f7d41ce7, seedSlot=9623302
8.6s was for regen state by running through 23 blocks
125825:Jul-30 01:41:20.945[chain] debug: Replaying blocks to get state stateRoot=0x49871dd7b44f68e98c7cb0677a1b25bdac0ab05fe62dada2aa0ccdcb37f7884f, replaySlots=9623287,9623286,9623285,9623284,9623283,9623282,9623281,9623280,9623279,9623278,9623277,9623276,9623275,9623274,9623273,9623272,9623271,9623270,9623269,9623268,9623267,9623266,9623265
126064:Jul-30 01:41:29.532[chain] debug: Replayed blocks to get state stateRoot=0x49871dd7b44f68e98c7cb0677a1b25bdac0ab05fe62dada2aa0ccdcb37f7884f, replaySlots=9623287,9623286,9623285,9623284,9623283,9623282,9623281,9623280,9623279,9623278,9623277,9623276,9623275,9623274,9623273,9623272,9623271,9623270,9623269,9623268,9623267,9623266,9623265
remaining time (~4.1s) was to run through epoch transition because block slot 9623287 and 9623304 stay on different epoch
126107:Jul-30 01:41:33.645[chain] verbose: Block processed slot=9623304, root=0x09548c9b217164e5386eb45c9324918f4b5407670cbcd8e7fa5b806dbcc57537, delaySec=22.644999980926514
part of #5968
Expected behavior
- Separate the log
Received gossip blocktoReceived gossip blockandValidated gossip block - Average block processed time is 8.6s/23 = 374ms which is too much. Normally it should be <= 70ms
- All of the above information should be tracked by metrics
Steps to reproduce
Run with minimal config of n-historical state: 'chain.nHistoricalStates: true','chain.maxBlockStates: 1','chain.maxCPStateEpochsInMemory: 0'
Additional context
No response
Operating system
Linux
Lodestar version or commit hash
v1.20.0
The block process time is not bad in the above time range
it's likely most of the time was spent for loading block from db + deserialize it + wait for nextEventLoop()
Average block processed time is 8.6s/23 = 374ms which is too much. Normally it should be <= 70ms
if we also compute hashTreeRoot() on beacon-node side, to process a block it takes ~120ms - 130ms on mainnet
this is on the stable mainnet node, v1.21.0
this is getPreState() time on the feat4 mainnet node with PR #7033
it was reduced from 13.2s (as in theDescription) to 7.29s which is likely the worse case scenario, this includes:
- <= 1.5s for loading checkpoint state
- <= 120ms * 31 blocks = 3.72s to reprocess blocks
- 2s for epoch transition
this is for mainnet