lodestar icon indicating copy to clipboard operation
lodestar copied to clipboard

[bun] processFinalizedCheckpoint performance issue

Open twoeths opened this issue 3 months ago • 6 comments

Describe the bug

it takes up to 40s to process finalized checkpoints

2084999:Oct-13 10:21:12.021[chain]         verbose: Start processing finalized checkpoint epoch=47231, rootHex=0x8a726696e9b5376dfdcc840fda61b61ac183f5f4ae39e0a1e0cea1f5563c2b6e
2085699:Oct-13 10:21:56.670[chain]         verbose: Finish processing finalized checkpoint epoch=47231, rootHex=0x8a726696e9b5376dfdcc840fda61b61ac183f5f4ae39e0a1e0cea1f5563c2b6e, prunedBlocks=29
2097567:Oct-13 10:27:36.037[chain]         verbose: Start processing finalized checkpoint epoch=47232, rootHex=0xe85dd90f32450a2359d3c46b0c008b17905261af8ff340e0b9c37fd9a7620ba2
2097579:Oct-13 10:28:02.774[chain]         verbose: Finish processing finalized checkpoint epoch=47232, rootHex=0xe85dd90f32450a2359d3c46b0c008b17905261af8ff340e0b9c37fd9a7620ba2, prunedBlocks=31
2115717:Oct-13 10:34:00.111[chain]         verbose: Start processing finalized checkpoint epoch=47233, rootHex=0xd960c80f0adad750321afdf3f849458ed66aa43a4de8622ab146cdfb80c005c7
2115784:Oct-13 10:34:27.597[chain]         verbose: Finish processing finalized checkpoint epoch=47233, rootHex=0xd960c80f0adad750321afdf3f849458ed66aa43a4de8622ab146cdfb80c005c7, prunedBlocks=30
2134326:Oct-13 10:40:24.039[chain]         verbose: Start processing finalized checkpoint epoch=47234, rootHex=0x2b063a21f50a29c0f5da082340628fe76c83b571321844b8dcd9c89a7069492f
2134585:Oct-13 10:40:51.423[chain]         verbose: Finish processing finalized checkpoint epoch=47234, rootHex=0x2b063a21f50a29c0f5da082340628fe76c83b571321844b8dcd9c89a7069492f, prunedBlocks=31
2150028:Oct-13 10:46:48.018[chain]         verbose: Start processing finalized checkpoint epoch=47235, rootHex=0xb7d87abe3b2df4199ea95ee3f956b84c857bc51f5358e98bee3c3bf5483e36be
2150044:Oct-13 10:47:16.420[chain]         verbose: Finish processing finalized checkpoint epoch=47235, rootHex=0xb7d87abe3b2df4199ea95ee3f956b84c857bc51f5358e98bee3c3bf5483e36be, prunedBlocks=28
2183293:Oct-13 10:59:36.021[chain]         verbose: Start processing finalized checkpoint epoch=47237, rootHex=0xf7795011e4b419ae4c0d445f79a7a0f1ddd8ed8f0a86bcbeb63340864aeb6666
2183312:Oct-13 11:00:06.860[chain]         verbose: Finish processing finalized checkpoint epoch=47237, rootHex=0xf7795011e4b419ae4c0d445f79a7a0f1ddd8ed8f0a86bcbeb63340864aeb6666, prunedBlocks=30

this causes hiccup processing gossip messages

Image

Expected behavior

it should take the same time to NodeJS, which is <3s consistently

Steps to reproduce

No response

Additional context

No response

Operating system

Linux

Lodestar version or commit hash

unstable

twoeths avatar Oct 13 '25 11:10 twoeths

we archive states at every finalized epoch which is not correct

2025-10-13 18:12:57.067 | Oct-13 11:12:57.067[chain]         verbose: Archived state completed finalizedEpoch=47239, minEpoch=46080, storedStateSlots=1474560,1507328,1511616, statesSlotsToDelete=1511616 |  
-- | -- | --
  |   | 2025-10-13 18:06:30.527 | Oct-13 11:06:30.527[chain]         verbose: Archived state completed finalizedEpoch=47238, minEpoch=46080, storedStateSlots=1474560,1507328,1511584, statesSlotsToDelete=1511584 |  
  |   | 2025-10-13 18:00:06.654 | Oct-13 11:00:06.654[chain]         verbose: Archived state completed finalizedEpoch=47237, minEpoch=46080, storedStateSlots=1474560,1507328,1511520,1511552, statesSlotsToDelete=1511520,1511552 |  
  |   | 2025-10-13 17:47:16.209 | Oct-13 10:47:16.209[chain]         verbose: Archived state completed finalizedEpoch=47235, minEpoch=46080, storedStateSlots=1474560,1507328,1511488, statesSlotsToDelete=1511488 |  
  |   | 2025-10-13 17:40:51.261 | Oct-13 10:40:51.261[chain]         verbose: Archived state completed finalizedEpoch=47234, minEpoch=46080, storedStateSlots=1474560,1507328,1511456, statesSlotsToDelete=1511456 |  
  |   | 2025-10-13 17:34:27.371 | Oct-13 10:34:27.371[chain]         verbose: Archived state completed finalizedEpoch=47233, minEpoch=46080, storedStateSlots=1474560,1507328,1511424, statesSlotsToDelete=1511424 |  
  |   | 2025-10-13 17:28:02.547 | Oct-13 10:28:02.547[chain]         verbose: Archived state completed finalizedEpoch=47232, minEpoch=46080, storedStateSlots=1474560,1507328,1511392, statesSlotsToDelete=1511392 |  
  |   | 2025-10-13 17:21:56.164 | Oct-13 10:21:56.164[chain]         verbose: Archived state completed finalizedEpoch=47231, minEpoch=46080, storedStateSlots=1474560,1507328,1511360, statesSlotsToDelete=1511360 |  
  |   | 2025-10-13 17:15:31.591 | Oct-13 10:15:31.591[chain]         verbose: Archived state completed finalizedEpoch=47230, minEpoch=46080, storedStateSlots=1474560,1507328,1511328, statesSlotsToDelete=1511328


vs NodeJS


2025-10-13 17:40:26.037 | Oct-13 10:40:26.037[chain]         verbose: Archived state completed finalizedEpoch=47234, minEpoch=46080, storedStateSlots=1474624,1507392,1510464, statesSlotsToDelete=1510464 |  
-- | -- | --
  |   | 2025-10-13 14:15:38.068 | Oct-13 07:15:38.068[chain]         verbose: Archived state completed finalizedEpoch=47202, minEpoch=46080, storedStateSlots=1474624,1507392,1509440, statesSlotsToDelete=1509440 |  
  |   | 2025-10-13 10:50:50.246 | Oct-13 03:50:50.246[chain]         verbose: Archived state completed finalizedEpoch=47170, minEpoch=46080, storedStateSlots=1474624,1507392,1508416, statesSlotsToDelete=1508416 |  
  |   | 2025-10-13 07:26:02.316 | Oct-13 00:26:02.316[chain]         verbose: Archived state completed finalizedEpoch=47138, minEpoch=46080, storedStateSlots=1474624,1507392, statesSlotsToDelete=


twoeths avatar Oct 13 '25 11:10 twoeths

looks like db.stateArchive.lastKey() returned the wrong value

Image

same to first time starting beacon-node


2025-10-13 19:36:17.713 | Oct-13 12:36:17.713[]              verbose: Found the last archived state slot=39936, size=145.10 MB |  
-- | -- | --
  |   | 2025-10-13 19:23:51.450 | Oct-13 12:23:51.449[]              verbose: Found the last archived state slot=39936, size=145.10 MB |  
  |   | 2025-10-13 17:54:07.285 | Oct-13 10:54:07.285[]              verbose: Found the last archived state slot=39936, size=145.10 MB |  
  |   | 2025-10-13 17:22:49.695 | Oct-13 10:22:49.694[]              verbose: Found the last archived state slot=39936, size=145.10 MB |  
  |   | 2025-10-13 17:11:42.232 | Oct-13 10:11:42.232[]              verbose: Found the last archived state slot=39936, size=145.10 MB |  
  |   | 2025-10-13 15:12:30.545 | Oct-13 08:12:30.545[]              verbose: Found the last archived state slot=39936, size=145.10 MB |  
  |   | 2025-10-13 14:56:43.361 | Oct-13 07:56:43.361[]              verbose: Found the last archived state slot=39936, size=145.10 MB


twoeths avatar Oct 13 '25 12:10 twoeths

looks like reverse option was not respected

Image

twoeths avatar Oct 13 '25 12:10 twoeths

detailed log of a finalized epoch processing

Oct-13 10:14:48.033[chain]         ^[[36mverbose^[[39m: Start processing finalized checkpoint epoch=47230, rootHex=0x2958646b9763ed911998d336f76a091bcf2d4b24aaa58e2dc986441846dc2bd8

Oct-13 10:14:48.058[chain]         ^[[36mverbose^[[39m: Migrated blocks from hot DB to cold DB fromSlot=1511360, toSlot=1511329, size=29
Oct-13 10:14:48.226[chain]         ^[[36mverbose^[[39m: Migrated blobSidecars from hot DB to cold DB migratedEntries=29
Oct-13 10:14:48.269[chain]         ^[[36mverbose^[[39m: blobSidecars prune: batchDelete range 1380320..1380351
Oct-13 10:14:48.270[chain]         ^[[36mverbose^[[39m: Archiving of finalized blocks complete totalArchived=29, finalizedEpoch=47230

Oct-13 10:15:23.049[chain]         ^[[36mverbose^[[39m: Archived finalized state epoch=47230, slot=1511360, root=0x2958646b9763ed911998d336f76a091bcf2d4b24aaa58e2dc986441846dc2bd8
Oct-13 10:15:31.591[chain]         ^[[36mverbose^[[39m: Archived state completed finalizedEpoch=47230, minEpoch=46080, storedStateSlots=1474560,1507328,1511328, statesSlotsToDelete=1511328
Oct-13 10:15:31.591[chain]         ^[[36mverbose^[[39m: Pruned finalized checkpoints states for epoch epoch=47229, persistCount=0, rootHexes=0x2b26a756a74597ea314be4cdcbb41783c88826aec49fb7beae0d9e85fddfdacd
Oct-13 10:15:32.082[chain]           ^[[34mdebug^[[39m: updated backfilledRanges key=1511360, value=1511328
Oct-13 10:15:32.082[chain]         ^[[36mverbose^[[39m: Finish processing finalized checkpoint epoch=47230, rootHex=0x2958646b9763ed911998d336f76a091bcf2d4b24aaa58e2dc986441846dc2bd8, prunedBlocks=29

there is some blockers before/after maybeArchiveState() even if we have to serialize / store a finalized state, it should not be that slow

twoeths avatar Oct 14 '25 02:10 twoeths

after #8529 there is still a performance issue with const lastStoredSlot = await this.db.stateArchive.lastKey(); which blocks the main thread @wemeetagain

Image

twoeths avatar Oct 16 '25 10:10 twoeths

update: deployed to another lg1k node and it said const lastStoredSlot = await this.db.stateArchive.lastKey(); is only blocked for 2.6s - still not great but way less than the above beacon node

Image

profile show that 500ms was blocked by leveldb_iterator_prev function, remaining was due to I/O being blocked I think

Image

maybeArchiveState_oct_17.json.zip

twoeths avatar Oct 17 '25 10:10 twoeths