polkadot-sdk
polkadot-sdk copied to clipboard
Collation fetching fairness
Related to https://github.com/paritytech/polkadot-sdk/issues/1797
When fetching collations in collator protocol/validator side we need to ensure that each parachain has got a fair core time share depending on its assignments in the claim queue. This means that the number of collations fetched per parachain should ideally be equal to (but definitely not bigger than) the number of claims for the particular parachain in the claim queue.
The current implementation doesn't guarantee such fairness. For each relay parent there is a waiting_queue (PerRelayParent -> Collations -> waiting_queue) which holds any unfetched collations advertised to the validator. The collations are fetched on first in first out principle which means that if two parachains share a core and one of the parachains is more aggresive it might starve the second parachain. How? At each relay parent up to max_candidate_depth candidates are accepted (enforced in fn is_seconded_limit_reached) so if one of the parachains is quick enough to fill in the queue with its advertisements the validator will never fetch anything from the rest of the parachains despite they are scheduled. This doesn't mean that the aggressive parachain will occupy all the core time (this is guaranteed by the runtime) but it will deny the rest of the parachains sharing the same core to have collations backed.
~~The solution I am proposing extends the checks in is_seconded_limit_reached with an additional check.~~ The solution I am proposing is to limit fetches and advertisements based on the state of the claim queue. At each relay parent the claim queue for the core assigned to the validator is fetched. For each parachain a fetch limit is calculated (equal to the number of entries in the claim queue). Advertisements are not fetched for a parachain which has exceeded its claims in the claim queue. This solves the problem with aggressive parachains advertising too much collations.
The second part is in collation fetching logic. The collator will keep track on which collations it has fetched so far. When a new collation needs to be fetched instead of popping the first entry from the waiting_queue the validator examines the claim queue and looks for the earliest claim which hasn't got a corresponding fetch. This way the collator will always try to prioritise the most urgent entries.
The second part is in collation fetching logic. Instead of popping the first entry from the
waiting_queuethe validator calculates score for each entry there. The score isperformed collation fetches for paracahin A at relay parent X / number of entries in claim queue for parachain A at relay parent X. The score will be lower for parachains which has less fetches than expected and 0 for parachains which has no fetches at all. This should provide an ordering based on the urgency of each fetch. If two parachains end up with the same score then the one earlier in the claim queue is preferred.
This was my initial idea (now removed from the PR description) which wasn't a good one. Calculating such a score prioritises parachains which hasn't got any fetches, which works good for small claim queues but is biased in the general case. For example with claim queue: [ a a a b], pending advertisements are [b a a a] the fetch order will be [a b a a] because at the second fetch para b will have score 0. However this is not optimal since there are two more entries for a in the claim queue.
The implemented approach (tracking what's fetched) solves this problem,
Since we have got core time and claim queue should we rework the collator protocol to allow parallel fetches for different para_ids at a single relay parent?
For each relay parent, we'll allow 3 collations, right? So a total of 9 across the three. In reality we should allow three (irrespective of the relay parents, may be AAA, ABB, ABC, CCC or any valid relay parent combination that doesn't exhaust the number of claims in the claim queue). Or am I missing something?
You are right. @eskimor pointed it out too and I completely forgot about it.
To implement proper limitation over all relay parents based on entries in the claim queue we need to take two things into account:
- forks
- runtime version (more on this later).
For example if we receive an advertisement for rp3, we have to start from rp3 and count the advertisements in the last N blocks, where N is min(allowed_ancestry_len, finalized_block_number).
rp1 - rp2 - rp3
\ - rp2' - rp3'
However at each relay parent we have to examine the state of the runtime. It can be: (case 1): Prospective parachains mode is disabled (case 2): Prospective parachains mode is enabled but claim queue runtime api is not available. (case 3): Prospective parachains mode AND claim queue runtime api are both available.
After a discussion with @alindima offline I think we can limit the complexity caused by the runtime version by:
(a) removing the usage of ProspectiveParachainsMode in validator side of the collator protocol.
(b) don't use max_candidate_depth as a collation limit at all as @alindima initially proposed here.
(a) will drop (case 1) completely. With (b) (case 2) can be converted to (case 3) by treating the former as claim queue with len=1 and populate it from the core state (we already do this for assignments). This way we will end up with simpler logic.
@eskimor what do you think?
AFAICT with the current logic, there will be an overlap in the number of allowed candidates at consecutive relay parents. Say we have a claim queue of size 3 and 3 relay parents A, B and C.
For each relay parent, we'll allow 3 collations, right? So a total of 9 across the three. In reality we should allow three (irrespective of the relay parents, may be AAA, ABB, ABC, CCC or any valid relay parent combination that doesn't exhaust the number of claims in the claim queue). Or am I missing something?
Another important point I think is that we should also modify the seconded limit in statement distribution (maybe not in this PR but we should have some plan for it). There, we allow
max_candidate_depth + 1seconded statements from each validator in a backing group. I think we should also take into account the claim queue there.max_candidate_depth should actually change its meaning or could be removed altogether. Currently, for elastic scaling, we would configure it as 6 to allow for 3 cores and it would really mean the unincluded candidate count. In this new scheme, we could even remove it. The unincluded candidate count would be (the number of claims in claimqueue) + (the number of candidates pending availability). WDYT @eskimor ?
Indeed. It looks like max_candidate_depth is made completely redundant by claim queue. In fact it is a replacement that improves on it, as it already caters to elastic scaling needs.
The claim queues purpose is to predict optimistically what can be backed, it can never be better, hence allowing more candidates then we have entries in the claim queue does not make any sense: There is no way these candidates will ever be able to be backed.
If we consider the claim queue for each core we also get the total number of blocks that can be prepared in advance and it scales automatically with the number of cores. So yes, it seems that the claim queue is sufficient and seems to be able to replace both max_candidate_depth and allowed_ancestry_len. Creating a ticket: https://github.com/paritytech/polkadot-sdk/issues/5079
The CI pipeline was cancelled due to failure one of the required jobs. Job name: test-linux-stable 3/3 Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7192394
I've created a follow up issue for the group rotation problem described by @eskimor: https://github.com/paritytech/polkadot-sdk/issues/5754
bot fmt
@tdimitrov https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7573708 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh". Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.
Comment bot cancel 2-32f4f5d2-4847-4304-a72c-c83d5228c964 to cancel this command or bot cancel to cancel all commands in this pull request.
@tdimitrov Command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh" has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7573708 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7573708/artifacts/download.
I see that you reverted to default lookahead value. Is the mystery of the failing zombienet test resolved?
Yes, turned out there is an issue after all: https://github.com/paritytech/polkadot-sdk/pull/4880#discussion_r1799415826
Hmm. Had a quick look, but not yet following. We still seem to track per relay parent (and per peer): How can we guarantee fairness in such a scheme, given that collators are free in picking relay parents?
We count candidates at relay parent X and all previous relay parents within the view (here).
Why do you say we track per peer? We have a check that a peer doesn't provide more entries than the elements in the claim queue as a quick spam protection check in insert_advertisement (here) but it only serves as an initial spam protection so that PeerData doesn't grow indefinitely. We do check the claim queue after that.
Hmm. Had a quick look, but not yet following. We still seem to track per relay parent (and per peer): How can we guarantee fairness in such a scheme, given that collators are free in picking relay parents?
We count candidates at relay parent X and all previous relay parents within the view (here).
Got it, thanks! Was a bit hidden ;-)
bot fmt
@tdimitrov https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7775406 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh". Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.
Comment bot cancel 11-e10beaf3-d5db-4c9c-8f34-8d588226affa to cancel this command or bot cancel to cancel all commands in this pull request.
@tdimitrov Command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh" has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7775406 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7775406/artifacts/download.
All GitHub workflows were cancelled due to failure one of the required jobs. Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12116652607 Failed job name: fmt
Is there a simpler scenario? 🤔
What about a collator builds two collations but due to weird network conditions the are delivered in the wrong order at the validator?
Or two different collators build two candidates at different relay parents and again due to weird network conditions the validator gets them out of order?
bot fmt
@tdimitrov https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7849396 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh". Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.
Comment bot cancel 1-21e90747-eefe-4641-8533-1f51c7667ce0 to cancel this command or bot cancel to cancel all commands in this pull request.
@tdimitrov Command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh" has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7849396 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7849396/artifacts/download.