Remove `DataColumnInfo`
Description
In #6559 were going ahead and removing BlobInfo. Its a bit of a pain to manage as it requires us to make an atomic transaction across two separate databases. Instead we're going to just iterate across the full blob db and prune blobs past the data availability window (additional info can be found in the linked issue)
I'd like to do the same for DataColumnInfo. Unless theres some additional need for this table that I'm unaware of, I'd like to go ahead and remove it.
Let's do it 🔥
Just noting that we currently use this (incorrectly) while processing RPC requests:
https://github.com/sigp/lighthouse/blob/e5b4983d6baf85770fe4539a565d8a2dd462bc53/beacon_node/network/src/network_beacon_processor/rpc_methods.rs#L1165-L1191
We should update to use the data column custody info, or the data availability boundary as appropriate.
I'm actually torn on removing this now. I'm writing down some pros and cons to help me make my mind up
Pros to removing
- Remove cross-database transactions for blob/column metadata. Counterpoint: we could do this by storing the metadata in the blobs DB itself.
Cons to removing
- Users loses all visibility into which blobs/columns are available in the database. While testing the
--complete-blob-backfillflag it was quite nice to see progress in the HTTP API. Counterpoint: this can kind of be inferred? Ifprune-blobs=falsewe can sort of guess that blobs are available back tooldest_block_slot(although this is not necessarily true), and otherwise we know they are available back to the data availability period? - Need a breaking DB migration to remove it now, this creates an incompatibility between
v8.0.0-rc.Xandv8.0.0proper which might catch some users off guard. If we were to keep the metadata and just move it to the blobs DB, then a two-way migration would be possible and we could implement this in v8.1.0 rather than v8.0.0.
Yeah maybe they do have their usecase. For DataColumnInfo the oldest column slot is wrong as soon as the cgc changes, so we'd probably need to fix that
Conclusion from our meeting today:
- Keep
BlobInfo. This will soon only be relevant for archive nodes backfilling beyond the DA period (i.e.--complete-blob-backfill). We might want to migrate it to the blobs DB in some future schema upgrade in order to simplify the cross-database transactions. - Remove
DataColumnInfoin favour ofDataColumnCustodyInfo. We have enough information between theDataColumnCustodyInfoand the data availability checker to determine data column availability. For regular non-archive nodes, we know that the earliest available column is either the start of the DA period (if backfill is complete/not running), or theearliest_data_column_slotif backfill is on-going. In the case of archive nodes, it's similar: either the node has finished backfilling all the way to the Fulu epoch, in which case theearliest_data_column_slotisNoneand the availability is everything back to Fulu, or backfill is on-going and data is fully available up toearliest_data_column_slot. One change we might want to make in this area is exposing some unified view of "earliest available data column" via the/lighthouse/database/infoAPI, as this will help users observe backfill.