cardano-db-sync Db sync should possibly avoid trying to fetch non-current pool metadata

OS Your OS: Ubuntu

Versions The db-sync version (eg cardano-db-sync --version): 13.6.0.4 PostgreSQL version: 17

Build/Install Method The method you use to build or install cardano-db-sync: downloaded binaries

Run method The method you used to run cardano-db-sync (eg Nix/Docker/systemd/none): systemd

Problem Report After doing some analysis of the records in off_chain_pool_fetch_error table and being perplexed for a little while about some metadata hash mismatch messages in there, it was pointed out to me by a fellow colleague that some of the records i was looking at were due to DB sync trying to fetch previous / not-most-recent version of the metadata using the now outdated either url or hash or both. After manually purging the off_chain_pool_fetch_error table on my DB sync instances the earlier seen errors seem to have gone away. This is an example of a log message in question:

230668 | 3085 | 2024-12-31 00:58:47.28307 | 24111 | Hash mismatch when fetching metadata from https://public.bladepool.com/metadata.json. Expected "2738e2233800ab7f82bd2212a9a55f52d4851f9147f161684c63e6655bedb562" but got "d7c25ea70f63c45413d56c35a80293e7dd859233c43c25e1b0cad2738cdfc037". | 51

230652 | 1498 | 2024-12-30 12:04:28.439982 | 27276 | Hash mismatch when fetching metadata from https://raw.githubusercontent.com/Bmtxs/sp/master/na.json. Expected "48cbb69c4384c9847369e89fd693e637236afb174813e05b6464e1cf2aea037d" but got "1df6e0d2b80ba684fbcca263fde20cfe8b5aa7a30ce15ff1fd79a8df2c5840a7". | 49

in both cases the pmr_id column value refers to not-the-most-recent pool update record, and in my case caused a bit of confusion. So this ticket is primarily to trigger a consideration of whether once new pmr_id is established for a pool, the retries can be cancelled for previous pmr_ids and possibly some table cleanup can be performed at that point in time (unless there's some value in retaining retry history in all of its entirety for previous iteration of metadata)

Jan 03 '25 02:01 hodlonaut

I'm also seeing metadata fetching attempts being made for a pool that retired in epoch 210 (i.e. back in year 2020), possibly another small optimisation opportunity unless this is all by design to try and have as thorough database representation of all pools as possible.

Jan 03 '25 02:01 hodlonaut

While the immediate task as put aptly by @kderme is 'to have a policy that stops fetching attempts when there is a newer pool update' , a side-action/question here is also, is there any thought about adding control for pool metadata refresh from dbsync itself:

Ability to blacklist a pool
Ability to manually refresh a given pool's metadata

I think for years, best practices around pool metadata operations for SPO has been to update contents (thus, different meta hash) when pool makes any meta changes - eg: for CNTools, we already add a nonce field to ensure users dont put multiple update entries with same URL/hash combination. For those who do not follow, Perhaps above could be managed as an addition of a column status into pool_metadata_ref table (allowing us to put a blacklist or refresh to specific pools):

If status is success, we already have an entry in pool offline data, do not re-attempt fetch in future for that id
If status is failed, there will be a re-attempt for that id (using current logic with sleeps) until a successful fetch or a new pool update entry
If status is skip , dbsync will not attempt fetching URL for this pool metadata reference ID.
If status is blacklist [manually overriden] , same as skip...but this allows manual control (cannot be overridden) for notorious / bad ops folks
If status is refresh , dbsync will re-attempt fetching metadata - allowing manual control for refresh for a given entry (eg: if the next polling for this entry it too far out)

Jan 03 '25 03:01 rdlrt

Perhaps above could be managed as an addition of a column status into pool_metadata_ref table (allowing us to put a blacklist or refresh to specific pools):

If status is success, we already have an entry in pool offline data, do not re-attempt fetch in future for that id

If status is failed, there will be a re-attempt for that id (using current logic with sleeps) until a successful fetch or a new pool update entry

If status is skip , dbsync will not attempt fetching URL for this pool.

If status is blacklist [manually overriden] , dbsync will skip fetching too...this allows manual control for notorious / bad ops folks

If status is refresh , dbsync will re-attempt fetching metadata - allowing manual control for refresh for a given entry

I like the idea of a status field. Currently, we look for corresponding rows in off_chain_pool_data and off_chain_pool_fetch_error, which I don't think will scale much farther.

Jan 03 '25 15:01 sgillespie

Since the problem only appears whilet DBSync is still syncing, I think it's not a big issue.

Using a status field to separate the current pool update from previous ones, would be quite useful. For the next major DBSync release we're trying to focus more on live data and separate them from the historic ones, eg this is similar https://github.com/IntersectMBO/cardano-db-sync/issues/1798

Designing the state machine, through rollbacks, manual intervention could be tricky. Delisting pools is already supported in table delisted_pool, but it may only affect the smash server.

Jan 07 '25 10:01 kderme

cardano-db-sync cardano-db-sync copied to clipboard

Db sync should possibly avoid trying to fetch non-current pool metadata

cardano-db-sync
cardano-db-sync copied to clipboard