lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Rate limit historic block backfill

Open michaelsproul opened this issue 3 years ago • 7 comments
trafficstars

Description

Several users here (https://github.com/sigp/lighthouse/issues/2904) and on Discord have reported their nodes becoming overwhelmed during backfill sync.

We haven't identified a specific issue with backfill sync that could cause this, but could defensively add a flag to rate-limit backfill sync, to prevent it from overwhelming other functionality.

I think we should probably do the rate-limiting in sync itself, so that it doesn't trip over itself trying to download more blocks than the DB can handle. The rate-limit could maybe be as simple as a configurable delay after each batch, which should give the rest of the node some room to breath. Someone more familiar with sync would have a better idea (@AgeManning @divagant-martian).

michaelsproul avatar May 24 '22 01:05 michaelsproul

Since the beacon processor's job already handles a sort of load balancing prioritizing some jobs over others, it would be easier to handle there. Sync isn't async so adding delays wouldn't be that simple. Thoughts?

divagant-martian avatar May 24 '22 01:05 divagant-martian

Oh yeah, I hadn't considered that + assumed sync was async

michaelsproul avatar May 24 '22 01:05 michaelsproul

Some thoughts - I dont think we go too fancy and have dynamic adjustments to load based on time of processing in the queue (but we could if someone really wanted to). Some simple ways to approach:

  • Configurable batch buffer sizes - We have a fixed buffer size, which keeps the processor busy with blocks (depending on its speed). If we set the batch buffer size to 1 for example, then we would process a block then start downloading another block. This would give some relief.
  • As diva suggested add a delay to sending back the processed block. Sync will only progress once a block processing has been submitted. So if we artificially slow down that process, sync will artificially slow down processing.
  • Adjust priorities in the beacon processor. Perhaps we could only process a beacon block from backfill once all other queues are depleted. As Backfill is not very important it should if a block stays there processing indefinately. Sync will just wait until it eventually goes through.

Just my 2 cents.

AgeManning avatar May 24 '22 05:05 AgeManning

Agree on not doing a fancy load balancing or anything like that, just reviewing job priority in the beacon processor. If it's already low priority and it's still interfering with other jobs, lowering the batch size should help. If we "add a delay" I would think about it not on the job but on the queues, throttling how often we start processing those. Given the options, after a review/adjustment of job priority I think batch size would be more helpful

divagant-martian avatar May 24 '22 05:05 divagant-martian

I'll handle this as we agreed, but I'm curious why is this assumed to be related to backfill sync?

divagant-martian avatar May 24 '22 22:05 divagant-martian

The evidence is pretty much anecdotal, Juan noticed it on a bunch of VPS nodes he manages (https://github.com/sigp/lighthouse/issues/3207#issuecomment-1134521896), and another Discord user was having trouble but I just checked the message history and it turns out they were running an HDD :scream:

Ahhh, @divagant-martian you were totally right about the prioritisation though. I think backfill batches count as chain segments so they're being processed first here: https://github.com/sigp/lighthouse/blob/aa72088f8fc91d41106a8afce7a0179cde64ce5d/beacon_node/network/src/beacon_processor/mod.rs#L967-L970

michaelsproul avatar May 24 '22 23:05 michaelsproul

Personal experience.. I was using a VPS (now migrated).. and it was using an SSD.. and I experienced missed attestations during the historic block sync after a checkpoint sync.. I do believe this move to making that job a lower priority is the right path. what matters the most after a sync, is continued operations.. the back-fill will finish when it finishes..

mrabino1 avatar May 25 '22 06:05 mrabino1

Any update here?

jmcruz1983 avatar Nov 30 '22 13:11 jmcruz1983

Yeah. There is more complexity here than we anticipated. So its not a straight-forward fix.

We're currently focusing on developing and testing protocol updates for the next hard-fork so this is currently being left as a lower priority task.

I imagine this to only disrupt a small number of nodes and only during backfill sync. If there is a greater urgency or bigger issue for this, we can re-evaluate our priorities for this.

AgeManning avatar Nov 30 '22 21:11 AgeManning

I wanted to add some notes after a discussion with @divagant-martian and @AgeManning this morning.

Broadly, there are three components involved in back-filling right now:

  • The "networking stack"
    • Determines the required blocks and downloads them from peers.
  • The BeaconProcessor
    • Receives unvalidated blocks from the networking stack and queues them for validation/import.
  • The BeaconChain
    • Receives batches dequeued by the BeaconProcessor and verifies them, ultimately sending a result back to the networking stack so it can get more blocks.

The current consensus for rate-limiting is that it should happen in the BeaconProcessor. This is because the networking stack makes no assumptions about how fast the batches are processed and the BeaconChain makes no assumptions about how frequently it should process backfill batches. Therefore, it seems fine for the BeaconProcessor to receive batches and arbitrarily delay their processing as it sees fit.

The BeaconProcessor is effectively bunch of FIFO/LIFO queues and a loop routine that pops messages out of an event_rx: mpsc::Receiver<WorkEvent<T>> channel and either:

  1. Processes the event immediately if there is a free "worker".
  2. If there are no free workers, queues the message for later processing.

We want backfill batches to follow a different flow. I believe that flow should look like this:

  1. A backfill batch from the network immediately goes into a newly-added FIFO queue ("newly-added" as in added by the PR that addresses this issue).
  2. A newly-added routine fires an event at some intervals which tries to pop a backfill batch from the FIFO queue.
  3. When a backfill batched is popped from the queue, it is sent to the existing event_rx channel where the BeaconProcessor will either process it immediately or queue it for processing with the next free worker (we can probably use the existing backfill queue for this).

This solution would allow us to do two things:

  1. Slow the import of batches, therefore reducing the total CPU time spent on backfill per slot.
  2. Perform backfill batch processing at very specific points in time. For example, we could do it half-way through each slot when we know that we're probably not producing/processing blocks/attestations.

Setting the batch import interval

When slowing the import of batches, it's worth considering how slow we're making it. Nodes are required to store at least 33024 epochs (1,056,768 slots, ~5 months) of blocks to serve them to p2p peers (see MIN_EPOCHS_FOR_BLOCK_REQUESTS).

The batch size is currently 64 slots, so if we're processing one batch of blocks per slot then we're looking at a backfill time of 1056768 / 64 * 12= 198,144 seconds (~2.29 days). The entire chain at its current length would take 5590000 / 64 * 12 = 1,048,125 seconds (12.13 days).

Notably, Lighthouse currently backfills the entire chain, although we might move to MIN_EPOCHS_FOR_BLOCK_REQUESTS in the future.

So, my feeling is that we probably want to speed this up by processing multiple batches per slot. I don't have a good feeling for how long it takes to process a batch. I think it would be important to check that first. Assuming it's ~500ms, I'd suggest process batches at the following intervals:

  • 6s after slot start.
  • 7s after slot start
  • 10s after slot start.

This is very hand-wavy, though. I think we would want to do some more analysis first. These intervals should be enough to get someone started on a solution, we can revist them later.

Disable rate-limiting

I think we should also provide the option for use to disable backfill rate-limiting. This allows "archive node" users to just sync the entire chain as fast as they can.

I suggest that we backfill rate-limiting by default, though.

paulhauner avatar Jan 16 '23 23:01 paulhauner

@divagant-martian has pointed out that backfill batches are actually 2x epochs: https://github.com/sigp/lighthouse/blob/6ac1c5b43951f26f18df8e0b7553fa93c30e0250/beacon_node/network/src/sync/backfill_sync/mod.rs#L35 🙏 I'll update the above comment.

paulhauner avatar Jan 17 '23 00:01 paulhauner

Thanks for the detailed notes @paulhauner!

To help with my understanding, I've created a diagram comparing backfill processing with / without rate-limiting. (thanks @realbigsean for the feedback!)

sequenceDiagram
    participant event_rx
    participant BeaconProcessor
    participant backfill_queue
    Title: Existing / Default backfill batch processing
    event_rx->>BeaconProcessor: new backfill batch work
    alt if worker available
        BeaconProcessor->>BeaconProcessor: process backfill batch immediately        
    else no available worker
        BeaconProcessor->>backfill_queue: push to queue
    end
    loop next loop
        alt if worker available
            BeaconProcessor-->>backfill_queue: pop from queue
            BeaconProcessor->>BeaconProcessor: process backfill batch
        end
    end
sequenceDiagram
    participant event_rx
    participant BeaconProcessor
    participant backfill_queue as backfill_queue  (existing) 
    participant backfill_scheduled_q as backfill_scheduled_q  (new)
    participant BackfillScheduler
    Title: backfill batch processing with rate-limiting
    event_rx->>BeaconProcessor: new backfill batch work
    BeaconProcessor->>backfill_scheduled_q: push to a "scheduled" queue
    loop At 6,7,10 seconds of after slot start
        BackfillScheduler-->>backfill_scheduled_q: pop work from queue
        BackfillScheduler->>event_rx: send scheduled backfill batch work
        event_rx->>BeaconProcessor: receive scheduled backfill batch work
    end
        alt if worker available
        BeaconProcessor->>BeaconProcessor: process backfill batch immediately        
    else no available worker
        BeaconProcessor->>backfill_queue: push to queue
    end
    loop next loop
        alt if worker available
            BeaconProcessor-->>backfill_queue: pop from queue
            BeaconProcessor->>BeaconProcessor: process backfill batch
        end
    end

jimmygchen avatar Jan 17 '23 23:01 jimmygchen

This diagram looks perfect! Great job at capturing all of that.

paulhauner avatar Jan 18 '23 00:01 paulhauner

@michaelsproul I've compared the WIP branch (with rate limiting to 1 batch per slot) against the latest stable version - it does seem to reduce the CPU usage (~20% CPU). Looking to increase the number of batches to 3 (6s,7s,10s after slot start) and will update the results.

Details on the work https://hackmd.io/@jimmygchen/SJuVpJL3j

WIP branch https://github.com/jimmygchen/lighthouse/pull/4

jimmygchen avatar Feb 01 '23 00:02 jimmygchen

Hey Jimmy, happy to start the WIP review. Would you mind changing the base to be sigp's lighthouse?

divagant-martian avatar Feb 01 '23 13:02 divagant-martian

Thanks @divagant-martian! 🙏 PR created here: https://github.com/sigp/lighthouse/pull/3936

jimmygchen avatar Feb 01 '23 14:02 jimmygchen

Resolved by #3936 :tada:

michaelsproul avatar May 05 '23 05:05 michaelsproul