solana icon indicating copy to clipboard operation
solana copied to clipboard

Sets default full snapshot archive interval to 50,000 slots

Open brooksprumo opened this issue 2 years ago • 7 comments

Problem

Right now full snapshots are ~45 GB each, and archiving/packaging takes a significant amount of time. If we can do this less often, that would be a win for everyone. Additionally, this gives more opportunity for a single full snapshot to be "useful". Meaning on restart it's more likely that an existing full snapshot will still be fresh, and only an incremental snapshot will be required to download. For comparison, incremental snapshots usually are in the 1 - 3 GB range.

Soon, the Incremental Accounts Hash feature gate will be enabled on mnb. This reduce the amount of work the accounts hash calculation must perform for incremental snapshots. Increasing the full snapshot interval will mean that we can do the cheaper incremental accounts hash more often (although not a ton more).

Summary of Changes

Change the default for --full-snapshot-interval-slots to 50,000.

brooksprumo avatar Nov 27 '23 22:11 brooksprumo

are there any potential downsides?

what is the difference in peak incremental snapshot size at a full snapshot cadence of 50k from the 1-3GB at 25k?

t-nelson avatar Nov 28 '23 00:11 t-nelson

For comparison, incremental snapshots usually are in the 1 - 3 GB range.

Similar concern as Trent - how big will incrementals be at the end of the 50k full interval? Not that we should be held hostage by poorly performing nodes, but I know some nodes were struggling to get incrementals done in time for that 100 slot interval. Wondering if this change will exacerbate that issue / wonder if we need to bump the incremental interval up in tandem

steviez avatar Nov 28 '23 00:11 steviez

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (564f1a9) 81.9% compared to head (6b339e0) 81.9%. Report is 547 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #34237     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         819      819             
  Lines      219380   219380             
=========================================
- Hits       179734   179709     -25     
- Misses      39646    39671     +25     

codecov[bot] avatar Nov 28 '23 00:11 codecov[bot]

what is the difference in peak incremental snapshot size at a full snapshot cadence of 50k from the 1-3GB at 25k?

I've started up a node to gather this information. I'll reply back once it has gone through a full snapshot interval and I can get all the incremental snapshot sizes.

brooksprumo avatar Nov 28 '23 16:11 brooksprumo

The node has now gone through multiple full snapshot intervals. The size of incremental snapshots over this interval go from ~650 MB to ~6.5 GB.

snapshot size

brooksprumo avatar Nov 29 '23 14:11 brooksprumo

Related, here's the Snapshot Packager Service.

Something to notice, yes, by making the full snapshot interval longer, the incremental snapshots do start taking longer to package up. Just before halfway through the full snapshot interval, this node now takes more time to package up an incremental snapshot than the 100 slots default interval.

The SPS implements a priority queue, so if there are multiple outstanding incremental snapshots to package, only the latest will be packaged and the older ones will be dropped. So we aren't doing extra work here in SPS. We are doing a bit extra work in AccountsBackgroundService and AccountsHashVerifier, by snapshotting a bank and then calculating the accounts hash even though this will not end up getting packaged up.

Based on this, I'd be inclined to bump the default incremental snapshot interval to 200 slots. Wdyt?

SnapshotPackagerService

brooksprumo avatar Nov 29 '23 14:11 brooksprumo

The SPS implements a priority queue, so if there are multiple outstanding incremental snapshots to package, only the latest will be packaged and the older ones will be dropped.

is this worded correctly? if we're always preferring the newest request, it sounds like we can get into a situation where we never complete any request. that is, it would make more sense to prioritize a request that is presently being serviced and drop any requests that arrive in the meantime.

So we aren't doing extra work here in SPS. We are doing a bit extra work in AccountsBackgroundService and AccountsHashVerifier, by snapshotting a bank and then calculating the accounts hash even though this will not end up getting packaged up.

what would it take to cancel work all the way up the chain?

Based on this, I'd be inclined to bump the default incremental snapshot interval to 200 slots. Wdyt?

what does it look like in practice?


what problem are we trying to solve again here? all that's been cited is an amorphous "win" afaict

t-nelson avatar Nov 29 '23 17:11 t-nelson

This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave

willhickey avatar Mar 03 '24 04:03 willhickey