lighthouse
lighthouse copied to clipboard
Rate limit historic block backfill
Description
Several users here (https://github.com/sigp/lighthouse/issues/2904) and on Discord have reported their nodes becoming overwhelmed during backfill sync.
We haven't identified a specific issue with backfill sync that could cause this, but could defensively add a flag to rate-limit backfill sync, to prevent it from overwhelming other functionality.
I think we should probably do the rate-limiting in sync itself, so that it doesn't trip over itself trying to download more blocks than the DB can handle. The rate-limit could maybe be as simple as a configurable delay after each batch, which should give the rest of the node some room to breath. Someone more familiar with sync would have a better idea (@AgeManning @divagant-martian).
Since the beacon processor's job already handles a sort of load balancing prioritizing some jobs over others, it would be easier to handle there. Sync isn't async so adding delays wouldn't be that simple. Thoughts?
Oh yeah, I hadn't considered that + assumed sync was async
Some thoughts - I dont think we go too fancy and have dynamic adjustments to load based on time of processing in the queue (but we could if someone really wanted to). Some simple ways to approach:
- Configurable batch buffer sizes - We have a fixed buffer size, which keeps the processor busy with blocks (depending on its speed). If we set the batch buffer size to 1 for example, then we would process a block then start downloading another block. This would give some relief.
- As diva suggested add a delay to sending back the processed block. Sync will only progress once a block processing has been submitted. So if we artificially slow down that process, sync will artificially slow down processing.
- Adjust priorities in the beacon processor. Perhaps we could only process a beacon block from backfill once all other queues are depleted. As Backfill is not very important it should if a block stays there processing indefinately. Sync will just wait until it eventually goes through.
Just my 2 cents.
Agree on not doing a fancy load balancing or anything like that, just reviewing job priority in the beacon processor. If it's already low priority and it's still interfering with other jobs, lowering the batch size should help. If we "add a delay" I would think about it not on the job but on the queues, throttling how often we start processing those. Given the options, after a review/adjustment of job priority I think batch size would be more helpful
I'll handle this as we agreed, but I'm curious why is this assumed to be related to backfill sync?
The evidence is pretty much anecdotal, Juan noticed it on a bunch of VPS nodes he manages (https://github.com/sigp/lighthouse/issues/3207#issuecomment-1134521896), and another Discord user was having trouble but I just checked the message history and it turns out they were running an HDD :scream:
Ahhh, @divagant-martian you were totally right about the prioritisation though. I think backfill batches count as chain segments so they're being processed first here: https://github.com/sigp/lighthouse/blob/aa72088f8fc91d41106a8afce7a0179cde64ce5d/beacon_node/network/src/beacon_processor/mod.rs#L967-L970
Personal experience.. I was using a VPS (now migrated).. and it was using an SSD.. and I experienced missed attestations during the historic block sync after a checkpoint sync.. I do believe this move to making that job a lower priority is the right path. what matters the most after a sync, is continued operations.. the back-fill will finish when it finishes..
Any update here?
Yeah. There is more complexity here than we anticipated. So its not a straight-forward fix.
We're currently focusing on developing and testing protocol updates for the next hard-fork so this is currently being left as a lower priority task.
I imagine this to only disrupt a small number of nodes and only during backfill sync. If there is a greater urgency or bigger issue for this, we can re-evaluate our priorities for this.
I wanted to add some notes after a discussion with @divagant-martian and @AgeManning this morning.
Broadly, there are three components involved in back-filling right now:
- The "networking stack"
- Determines the required blocks and downloads them from peers.
- The
BeaconProcessor- Receives unvalidated blocks from the networking stack and queues them for validation/import.
- The
BeaconChain- Receives batches dequeued by the
BeaconProcessorand verifies them, ultimately sending a result back to the networking stack so it can get more blocks.
- Receives batches dequeued by the
The current consensus for rate-limiting is that it should happen in the BeaconProcessor. This is because the networking stack makes no assumptions about how fast the batches are processed and the BeaconChain makes no assumptions about how frequently it should process backfill batches. Therefore, it seems fine for the BeaconProcessor to receive batches and arbitrarily delay their processing as it sees fit.
The BeaconProcessor is effectively bunch of FIFO/LIFO queues and a loop routine that pops messages out of an event_rx: mpsc::Receiver<WorkEvent<T>> channel and either:
- Processes the event immediately if there is a free "worker".
- If there are no free workers, queues the message for later processing.
We want backfill batches to follow a different flow. I believe that flow should look like this:
- A backfill batch from the network immediately goes into a newly-added FIFO queue ("newly-added" as in added by the PR that addresses this issue).
- A newly-added routine fires an event at some intervals which tries to pop a backfill batch from the FIFO queue.
- When a backfill batched is popped from the queue, it is sent to the existing
event_rxchannel where theBeaconProcessorwill either process it immediately or queue it for processing with the next free worker (we can probably use the existing backfill queue for this).
This solution would allow us to do two things:
- Slow the import of batches, therefore reducing the total CPU time spent on backfill per slot.
- Perform backfill batch processing at very specific points in time. For example, we could do it half-way through each slot when we know that we're probably not producing/processing blocks/attestations.
Setting the batch import interval
When slowing the import of batches, it's worth considering how slow we're making it. Nodes are required to store at least 33024 epochs (1,056,768 slots, ~5 months) of blocks to serve them to p2p peers (see MIN_EPOCHS_FOR_BLOCK_REQUESTS).
The batch size is currently 64 slots, so if we're processing one batch of blocks per slot then we're looking at a backfill time of 1056768 / 64 * 12= 198,144 seconds (~2.29 days). The entire chain at its current length would take 5590000 / 64 * 12 = 1,048,125 seconds (12.13 days).
Notably, Lighthouse currently backfills the entire chain, although we might move to MIN_EPOCHS_FOR_BLOCK_REQUESTS in the future.
So, my feeling is that we probably want to speed this up by processing multiple batches per slot. I don't have a good feeling for how long it takes to process a batch. I think it would be important to check that first. Assuming it's ~500ms, I'd suggest process batches at the following intervals:
- 6s after slot start.
- 7s after slot start
- 10s after slot start.
This is very hand-wavy, though. I think we would want to do some more analysis first. These intervals should be enough to get someone started on a solution, we can revist them later.
Disable rate-limiting
I think we should also provide the option for use to disable backfill rate-limiting. This allows "archive node" users to just sync the entire chain as fast as they can.
I suggest that we backfill rate-limiting by default, though.
@divagant-martian has pointed out that backfill batches are actually 2x epochs: https://github.com/sigp/lighthouse/blob/6ac1c5b43951f26f18df8e0b7553fa93c30e0250/beacon_node/network/src/sync/backfill_sync/mod.rs#L35 🙏 I'll update the above comment.
Thanks for the detailed notes @paulhauner!
To help with my understanding, I've created a diagram comparing backfill processing with / without rate-limiting. (thanks @realbigsean for the feedback!)
sequenceDiagram
participant event_rx
participant BeaconProcessor
participant backfill_queue
Title: Existing / Default backfill batch processing
event_rx->>BeaconProcessor: new backfill batch work
alt if worker available
BeaconProcessor->>BeaconProcessor: process backfill batch immediately
else no available worker
BeaconProcessor->>backfill_queue: push to queue
end
loop next loop
alt if worker available
BeaconProcessor-->>backfill_queue: pop from queue
BeaconProcessor->>BeaconProcessor: process backfill batch
end
end
sequenceDiagram
participant event_rx
participant BeaconProcessor
participant backfill_queue as backfill_queue (existing)
participant backfill_scheduled_q as backfill_scheduled_q (new)
participant BackfillScheduler
Title: backfill batch processing with rate-limiting
event_rx->>BeaconProcessor: new backfill batch work
BeaconProcessor->>backfill_scheduled_q: push to a "scheduled" queue
loop At 6,7,10 seconds of after slot start
BackfillScheduler-->>backfill_scheduled_q: pop work from queue
BackfillScheduler->>event_rx: send scheduled backfill batch work
event_rx->>BeaconProcessor: receive scheduled backfill batch work
end
alt if worker available
BeaconProcessor->>BeaconProcessor: process backfill batch immediately
else no available worker
BeaconProcessor->>backfill_queue: push to queue
end
loop next loop
alt if worker available
BeaconProcessor-->>backfill_queue: pop from queue
BeaconProcessor->>BeaconProcessor: process backfill batch
end
end
This diagram looks perfect! Great job at capturing all of that.
@michaelsproul I've compared the WIP branch (with rate limiting to 1 batch per slot) against the latest stable version - it does seem to reduce the CPU usage (~20% CPU). Looking to increase the number of batches to 3 (6s,7s,10s after slot start) and will update the results.
Details on the work https://hackmd.io/@jimmygchen/SJuVpJL3j
WIP branch https://github.com/jimmygchen/lighthouse/pull/4
Hey Jimmy, happy to start the WIP review. Would you mind changing the base to be sigp's lighthouse?
Thanks @divagant-martian! 🙏 PR created here: https://github.com/sigp/lighthouse/pull/3936
Resolved by #3936 :tada: