cardano-ledger icon indicating copy to clipboard operation
cardano-ledger copied to clipboard

Query a slice of Stake credentials

Open kderme opened this issue 3 years ago • 14 comments

We're looking for a query which extracts the stake distribution from the set snapshot in fixed size slices. This would look something like

startIndex -> resultSize -> LedgerState -> Map StakeCredential (Coin, PoolId)

We plan to execute this query once per block, with resultSize = ~ 1000 in order to get the whole distribution for every epoch.

Some properties it would ideally have:

  • The size of the result will be equal to the resultSize, unless there are no more data.
  • It won't have any duplicates if called for different intervals
  • It will be relatively efficient, since it will probably be executed for thousands of blocks per epoch. Hopefully it won't be linear to the stake or delegation size.
  • Result will be deterministic and repeatable (I guess hard not to be)

Advice on how to write such a query is also welcome.

kderme avatar Feb 15 '22 21:02 kderme

The stake distribution uses a vector map, so lookup should be constant. The ledger could provide a helper function to perform this query.

After there is a ledger helper function, we would need to cut an new version of the node-to-client protocol version and then add a new query there.

Keep in mind that the stake distribution will eventually be moving to disk, so any query we make now regarding the stake distribution will have to be adapted to query a database.

JaredCorduan avatar Feb 15 '22 22:02 JaredCorduan

Additionally is there any efficient helper function to get the number of blocks made in the current epoch? We will most probably define startIndex = resultSize * blocksMade

kderme avatar Feb 16 '22 13:02 kderme

The best I can think of is the DebugChainDepState query in consensus, which includes a mapping of the stake pools to the number of blocks produced by the pool. I know you wouldn't be using the CLI, but as an example, this is the total current number of blocks made:

cardano-cli query protocol-state --mainnet | jq '.csProtocol[0]|add'

JaredCorduan avatar Feb 16 '22 14:02 JaredCorduan

Hm I guess folding this mapping for every block may not be that efficient.

Maybe we can have conditions to stop this folding when blocksCounter * resultSize >= stake.size or directly return an empty Map when size of mapping * resultSize >= stake.size, which will probably be the most common case.

Could we get a second helper function, which computes startIndex internally?

Also we don't really use the consensus queries currently. We directly access the ledger state.

kderme avatar Feb 16 '22 15:02 kderme

@kderme is it easy for db-sync to quickly retrieve the block number of the first block of the epoch and last block of the epoch? those two numbers would determine startIndex.

JaredCorduan avatar Feb 16 '22 16:02 JaredCorduan

This currently seems suprisingly slow

cexplorer=# select min (block_no) from block where epoch_no = 319;
   min   
---------
 6842099
(1 row)

Time: 1172,601 ms (00:01,173)

We would need to implement some form of caching.

kderme avatar Feb 16 '22 21:02 kderme

This currently seems suprisingly slow

Isnt it due to use of min? Wonder if you can use below (since block table is already sorted due to sync process, unless there's a fork - which wouldn't be a concern for block_no at epoch transition):

select block_no from block where epoch_no = 319 LIMIT 1;
 block_no
----------
  6842099
(1 row)

Time: 0.440 ms

select block_no from block where epoch_no = 318 ORDER BY id DESC LIMIT 1;
 block_no
----------
  6842098
(1 row)

Time: 25.030 ms

rdlrt avatar Feb 16 '22 22:02 rdlrt

I wouldn't rely on the auto-incremental block.id, but yes we can find some other query

select count(*) from block where epoch_no = 319;
 count 
-------
  8401
(1 row)

Time: 5,980 ms

So let's go with the initial plan.

kderme avatar Feb 17 '22 12:02 kderme

@kderme can you explain why we shouldn't rely on the auto-incremental block.id? Rollbacks will just mean that we are actually further along retrieving the stake distribution than we though (and the set snapshot is stable at this point, so there is no worry of incorrect data).

JaredCorduan avatar Feb 17 '22 12:02 JaredCorduan

The auto-incremental block.id is managed internally by postgtres, its behaviour changes between postrges versions and there have been reports about it misbehaving https://github.com/input-output-hk/cardano-db-sync/issues/974.

Counting blocks of epoch seems more clear to me and is equivalent to subtracting the blockNo of the first block from the current blockNo + 1.

kderme avatar Feb 17 '22 13:02 kderme

makes perfect sense, thanks, and I agree completely then.

JaredCorduan avatar Feb 17 '22 13:02 JaredCorduan

The auto-incremental block.id is managed internally by postgtres, its behaviour changes between postrges versions and there have been reports about it misbehaving https://github.com/input-output-hk/cardano-db-sync/issues/974.

While I can accept that the auto increment behavior might change between versions, I think its pretty unlikely that it is misbehaving. If there is any misbehavior, I think its almost certainly a problem in db-sync rather than postgres.

erikd avatar Mar 15 '22 00:03 erikd

@kderme can you explain why we shouldn't rely on the auto-incremental block.id?

Because auto generated ids are never reused. That means if we have blocks with id of N, N+1, N+2 and then get a two block rollback, and then a 2 block roll forward, the ids will now be N, N+3, N+4.

erikd avatar Mar 15 '22 06:03 erikd

I've implemented this here, any review would be welcome. I extract a slice from the _delegations of the snapshot and query the _stake for each stake credentials. I'm not sure if the domain of the delegations is a subset of the domain of stakes. If not this property doesn't hold:

The size of the result will be equal to the resultSize, unless there are no more data.

kderme avatar Mar 20 '22 20:03 kderme