lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Remove `DataColumnInfo`

Open eserilev opened this issue 1 year ago • 1 comments

Description

In #6559 were going ahead and removing BlobInfo. Its a bit of a pain to manage as it requires us to make an atomic transaction across two separate databases. Instead we're going to just iterate across the full blob db and prune blobs past the data availability window (additional info can be found in the linked issue)

I'd like to do the same for DataColumnInfo. Unless theres some additional need for this table that I'm unaware of, I'd like to go ahead and remove it.

eserilev avatar Nov 07 '24 03:11 eserilev

Let's do it 🔥

michaelsproul avatar Nov 07 '24 04:11 michaelsproul

Just noting that we currently use this (incorrectly) while processing RPC requests:

https://github.com/sigp/lighthouse/blob/e5b4983d6baf85770fe4539a565d8a2dd462bc53/beacon_node/network/src/network_beacon_processor/rpc_methods.rs#L1165-L1191

We should update to use the data column custody info, or the data availability boundary as appropriate.

michaelsproul avatar Oct 14 '25 04:10 michaelsproul

I'm actually torn on removing this now. I'm writing down some pros and cons to help me make my mind up

Pros to removing

  • Remove cross-database transactions for blob/column metadata. Counterpoint: we could do this by storing the metadata in the blobs DB itself.

Cons to removing

  • Users loses all visibility into which blobs/columns are available in the database. While testing the --complete-blob-backfill flag it was quite nice to see progress in the HTTP API. Counterpoint: this can kind of be inferred? If prune-blobs=false we can sort of guess that blobs are available back to oldest_block_slot (although this is not necessarily true), and otherwise we know they are available back to the data availability period?
  • Need a breaking DB migration to remove it now, this creates an incompatibility between v8.0.0-rc.X and v8.0.0 proper which might catch some users off guard. If we were to keep the metadata and just move it to the blobs DB, then a two-way migration would be possible and we could implement this in v8.1.0 rather than v8.0.0.

michaelsproul avatar Oct 15 '25 02:10 michaelsproul

Yeah maybe they do have their usecase. For DataColumnInfo the oldest column slot is wrong as soon as the cgc changes, so we'd probably need to fix that

eserilev avatar Oct 15 '25 05:10 eserilev

Conclusion from our meeting today:

  • Keep BlobInfo. This will soon only be relevant for archive nodes backfilling beyond the DA period (i.e. --complete-blob-backfill). We might want to migrate it to the blobs DB in some future schema upgrade in order to simplify the cross-database transactions.
  • Remove DataColumnInfo in favour of DataColumnCustodyInfo. We have enough information between the DataColumnCustodyInfo and the data availability checker to determine data column availability. For regular non-archive nodes, we know that the earliest available column is either the start of the DA period (if backfill is complete/not running), or the earliest_data_column_slot if backfill is on-going. In the case of archive nodes, it's similar: either the node has finished backfilling all the way to the Fulu epoch, in which case the earliest_data_column_slot is None and the availability is everything back to Fulu, or backfill is on-going and data is fully available up to earliest_data_column_slot. One change we might want to make in this area is exposing some unified view of "earliest available data column" via the /lighthouse/database/info API, as this will help users observe backfill.

michaelsproul avatar Oct 16 '25 01:10 michaelsproul