lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

#6853 Adding store tests for data column pruning

Open SunnysidedJ opened this issue 9 months ago • 14 comments

Issue Addressed

#6853 Update store tests to cover data column pruning

Proposed Changes

Created a helper function check_data_column_existence which is a copy of check_blob_existence but checking data columns instead. The helper function is then used to check whether data columns are also pruned when blobs are pruned if PeerDAS is enabled.

Additional Info

There were three bugs discovered when writing these tests

  1. Data column info was not updated post fulu
  2. Blob info was continuing to be updated post fulu
  3. delete_if was broken for leveldb

For bug 3, we now seek to the correct column before trying to delete values. The prune_blobs_across_fork_boundary test provides test coverage for this change. When there are both blobs and data columns in the blob store, delete_if properly prunes blobs (or data columns). Without seek blobs would never be pruned as soon as the first data column was inserted into the blob store

SunnysidedJ avatar Mar 31 '25 15:03 SunnysidedJ

Thanks for the PR @SunnysidedJ !

Please see my comments above and let me know if it's unclear or if you have any questions!

jimmygchen avatar Apr 01 '25 06:04 jimmygchen

Major changes:

  • Separated tests for data column pruning.
  • Blob pruning tests only run on Deneb while column ones only run on Fulu with PeerDAS enabled.
  • Found that fork_boundary test is not working (see inline comments).

SunnysidedJ avatar Apr 02 '25 19:04 SunnysidedJ

When I enable Fulu and try blob pruning, the blobs aren't pruned correctly. It'd be appreciated if you could have a look on the commit that reproduces the problem: https://github.com/SunnysidedJ/lighthouse/commit/6217cf9d67e3976a0d7d333f3691c316ababaff3

SunnysidedJ avatar Apr 14 '25 09:04 SunnysidedJ

Some required checks have failed. Could you please take a look @SunnysidedJ? 🙏

mergify[bot] avatar May 19 '25 05:05 mergify[bot]

Hi @SunnysidedJ, this pull request has been closed automatically due to 30 days of inactivity. If you’d like to continue working on it, feel free to reopen at any time.

mergify[bot] avatar May 19 '25 05:05 mergify[bot]

I'm going to take a look at the fulu blob pruning test issues

eserilev avatar May 22 '25 20:05 eserilev

The test is now passing. The reason it was failing was two fold:

  • we werent advancing the slot clock before extending the chain
  • we tried to fork from deneb directly to fulu

I've added an intermediary step to fork to electra and run some checks there. Not sure if theres any additional checks we want to make in this test.

eserilev avatar May 22 '25 21:05 eserilev

Some required checks have failed. Could you please take a look @SunnysidedJ? 🙏

mergify[bot] avatar May 22 '25 22:05 mergify[bot]

ah still need to fix one other test, working on that now

eserilev avatar May 22 '25 22:05 eserilev

a634758 we werent updating data column info after pruning.

eserilev avatar May 23 '25 16:05 eserilev

All required checks have passed and there are no merge conflicts. This pull request may now be ready for another review.

mergify[bot] avatar May 23 '25 17:05 mergify[bot]

All required checks have passed and there are no merge conflicts. This pull request may now be ready for another review.

mergify[bot] avatar May 27 '25 20:05 mergify[bot]

Theres an issue with the way we try to prune blobs after fulu. If there are data columns in the blobs db delete_if, in the leveldb impl, doesn't work as expected

pub fn delete_if(
        &self,
        column: DBColumn,
        mut f: impl FnMut(&[u8]) -> Result<bool, Error>,
    ) -> Result<(), Error> {
        let mut leveldb_batch = Writebatch::new();
        let iter = self.db.iter(self.read_options());

        iter.take_while(move |(key, _)| key.matches_column(column))
            .for_each(|(key, value)| {
                if f(&value).unwrap_or(false) {
                    let _timer = metrics::start_timer(&metrics::DISK_DB_DELETE_TIMES);
                    metrics::inc_counter_vec(&metrics::DISK_DB_DELETE_COUNT, &[column.into()]);
                    leveldb_batch.delete(key);
                }
            });

        self.db.write(self.write_options().into(), &leveldb_batch)?;
        Ok(())
    }

When we fork to fulu there will be two DBColumns in the blob store, BeaconBlob and BeaconDataColumn. Because of the way these columns are serialized, BeaconDataColumn values will always preceed BeaconBlob values in the leveldb blob store. In delete_if when we try to delete BeaconBlob values while BeaconDataColumn values exist in the blob store, iter.take_while will return false on the first key and immediately exit. This issue doesn't occur when using redb beacuse redb has Tables

In the leveldb case I'm thinking for delete_if we should introduce a seek to forward the iterator to the first instance of a given DBColumn. I need to double check other delete_if usage to make sure that doesn't potentially break other things or cause performance degradation.

EDIT:

IT seems we only use delete_if in the blob store, so introducing seek should be fine

eserilev avatar May 31 '25 17:05 eserilev

this should be good to go for a review

eserilev avatar Jun 03 '25 01:06 eserilev

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #7228 has been dequeued. The pull request could not be merged. This could be related to an activated branch protection or ruleset rule that prevents us from merging. (details: 1 review requesting changes and 1 approving review by reviewers with write access.).

You can check the last failing draft PR here: #8207.

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it. If you do update this pull request, it will automatically be requeued once the queue conditions match again. If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

mergify[bot] avatar Oct 15 '25 23:10 mergify[bot]

@mergify requeue

michaelsproul avatar Oct 16 '25 00:10 michaelsproul

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

mergify[bot] avatar Oct 16 '25 00:10 mergify[bot]

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #7228 has been dequeued. The pull request could not be merged. This could be related to an activated branch protection or ruleset rule that prevents us from merging. (details: 1 review requesting changes and 1 approving review by reviewers with write access.).

You can check the last failing draft PR here: #8209.

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it. If you do update this pull request, it will automatically be requeued once the queue conditions match again. If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

mergify[bot] avatar Oct 16 '25 02:10 mergify[bot]

This pull request has been removed from the queue for the following reason: checks timeout.

The merge conditions cannot be satisfied due to checks timeout:

You can check the last failing draft PR here: #8212.

You may have to fix your CI before adding the pull request to the queue again.

If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again. If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

mergify[bot] avatar Oct 16 '25 07:10 mergify[bot]

@mergify requeue

jimmygchen avatar Oct 16 '25 11:10 jimmygchen

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

mergify[bot] avatar Oct 16 '25 11:10 mergify[bot]

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #7228 has been dequeued. The pull request could not be merged. This could be related to an activated branch protection or ruleset rule that prevents us from merging. (details: 1 review requesting changes and 1 approving review by reviewers with write access.).

You can check the last failing draft PR here: #8220.

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it. If you do update this pull request, it will automatically be requeued once the queue conditions match again. If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

mergify[bot] avatar Oct 16 '25 13:10 mergify[bot]