cardano-ledger
cardano-ledger copied to clipboard
Query a slice of Stake credentials
We're looking for a query which extracts the stake distribution from the set snapshot
in fixed size slices. This would look something like
startIndex -> resultSize -> LedgerState -> Map StakeCredential (Coin, PoolId)
We plan to execute this query once per block, with resultSize = ~ 1000
in order to get the whole distribution for every epoch.
Some properties it would ideally have:
- The size of the result will be equal to the
resultSize
, unless there are no more data. - It won't have any duplicates if called for different intervals
- It will be relatively efficient, since it will probably be executed for thousands of blocks per epoch. Hopefully it won't be linear to the stake or delegation size.
- Result will be deterministic and repeatable (I guess hard not to be)
Advice on how to write such a query is also welcome.
The stake distribution uses a vector map, so lookup should be constant. The ledger could provide a helper function to perform this query.
After there is a ledger helper function, we would need to cut an new version of the node-to-client protocol version and then add a new query there.
Keep in mind that the stake distribution will eventually be moving to disk, so any query we make now regarding the stake distribution will have to be adapted to query a database.
Additionally is there any efficient helper function to get the number of blocks made in the current epoch? We will most probably define startIndex = resultSize * blocksMade
The best I can think of is the DebugChainDepState query in consensus, which includes a mapping of the stake pools to the number of blocks produced by the pool. I know you wouldn't be using the CLI, but as an example, this is the total current number of blocks made:
cardano-cli query protocol-state --mainnet | jq '.csProtocol[0]|add'
Hm I guess folding this mapping for every block may not be that efficient.
Maybe we can have conditions to stop this folding when blocksCounter * resultSize >= stake.size
or
directly return an empty Map when size of mapping * resultSize >= stake.size
, which will probably be the most common case.
Could we get a second helper function, which computes startIndex
internally?
Also we don't really use the consensus queries currently. We directly access the ledger state.
@kderme is it easy for db-sync to quickly retrieve the block number of the first block of the epoch and last block of the epoch? those two numbers would determine startIndex
.
This currently seems suprisingly slow
cexplorer=# select min (block_no) from block where epoch_no = 319;
min
---------
6842099
(1 row)
Time: 1172,601 ms (00:01,173)
We would need to implement some form of caching.
This currently seems suprisingly slow
Isnt it due to use of min
? Wonder if you can use below (since block table is already sorted due to sync process, unless there's a fork - which wouldn't be a concern for block_no
at epoch transition):
select block_no from block where epoch_no = 319 LIMIT 1;
block_no
----------
6842099
(1 row)
Time: 0.440 ms
select block_no from block where epoch_no = 318 ORDER BY id DESC LIMIT 1;
block_no
----------
6842098
(1 row)
Time: 25.030 ms
I wouldn't rely on the auto-incremental block.id, but yes we can find some other query
select count(*) from block where epoch_no = 319;
count
-------
8401
(1 row)
Time: 5,980 ms
So let's go with the initial plan.
@kderme can you explain why we shouldn't rely on the auto-incremental block.id? Rollbacks will just mean that we are actually further along retrieving the stake distribution than we though (and the set
snapshot is stable at this point, so there is no worry of incorrect data).
The auto-incremental block.id is managed internally by postgtres, its behaviour changes between postrges versions and there have been reports about it misbehaving https://github.com/input-output-hk/cardano-db-sync/issues/974.
Counting blocks of epoch seems more clear to me and is equivalent to subtracting the blockNo of the first block from the current blockNo + 1.
makes perfect sense, thanks, and I agree completely then.
The auto-incremental block.id is managed internally by postgtres, its behaviour changes between postrges versions and there have been reports about it misbehaving https://github.com/input-output-hk/cardano-db-sync/issues/974.
While I can accept that the auto increment behavior might change between versions, I think its pretty unlikely that it is misbehaving. If there is any misbehavior, I think its almost certainly a problem in db-sync
rather than postgres
.
@kderme can you explain why we shouldn't rely on the auto-incremental block.id?
Because auto generated id
s are never reused. That means if we have blocks with id
of N, N+1, N+2
and then get a two block rollback, and then a 2 block roll forward, the id
s will now be N, N+3, N+4
.
I've implemented this here, any review would be welcome. I extract a slice from the _delegations
of the snapshot and query the _stake
for each stake credentials. I'm not sure if the domain of the delegations is a subset of the domain of stakes. If not this property doesn't hold:
The size of the result will be equal to the resultSize, unless there are no more data.