solana
solana copied to clipboard
abs: Update last_full_snapshot_slot before calling clean_accounts()
Problem
Accounts Background Service tracks the last full snapshot slot, which is used when calling clean_accounts()
. When handling a full snapshot request, the last_full_snapshot_slot
value is updated after calling clean_accounts()
. This means that zero-lamport accounts will be kept based on the previous full snapshot slot, instead of this current slot. By default, that's 25,000 slots-worth of zero-lamport accounts that are not getting cleaned up when they could be.
Edit to add: The next time clean_accounts()
gets called (after a full snapshot, in the main ABS loop) will correctly clean up those zero-lamport accounts. This change allowed cleaning before taking the snapshot, so the snapshot archive should be a bit smaller if this was able to remove accounts and shrink append vecs.
Summary of Changes
Update last_full_snapshot_slot
before calling clean_accounts()
.
It seems like
clean_accounts
is written with the assumption that a snapshot already exists atlast_full_snapshot_slot
, rather than is being currently created.
Yah! One thing that is not necessarily obvious: the snapshot process is not allowed to fail. If it does, asserts/panics fire and the node will die. So I was thinking that proactively cleaning more zero-lamport accounts should be safe. Since the max_clean_root
will be set to the snapshot bank's slot minus 1, I think this will enable cleaning zero lamport accounts [previous full snapshot slot, this full snapshot slot)
, where this slot
is exclusive
.
I'm trying to convince myself which way is actually right; having your insight is useful to make sure all corner cases are considered. Thanks for reviewing!
do we have any metrics that track uncleaned zero lamport accounts?
do we have any metrics that track uncleaned zero lamport accounts?
Maybe? Not that I'm aware of. That's probably a good thing to check before merging this.
I'm going to run this against mnb for a while to ensure it's safe.
I'm going to run this against mnb for a while to ensure it's safe.
I ran this branch for ~4 days without issue (until the mnb restart happened). I feel OK about its current state. Will monitor for any issues going forward.