scylladb
scylladb copied to clipboard
Rack aware view pairing
Enabled with the tablets_rack_aware_view_pairing cluster feature rack-aware pairing pairs base to view replicas that are in the same dc and rack, using their ordinality in the replica map
We distinguish between 2 cases:
-
Simple rack-aware pairing: when the replication factor in the dc is a multiple of the number of racks and the minimum number of nodes per rack in the dc is greater than or equal to rf / nr_racks.
In this case (that includes the single rack case), all racks would have the same number of replicas, so we first filter all replicas by dc and rack, retaining their ordinality in the process, and finally, we pair between the base replicas and view replicas, that are in the same rack, using their original order in the tablet-map replica set.
For example, nr_racks=2, rf=4: base_replicas = { N00, N01, N10, N11 } view_replicas = { N11, N12, N01, N02 } pairing would be: { N00, N01 }, { N01, N02 }, { N10, N11 }, { N11, N12 } Note that we don't optimize for self-pairing if it breaks pairing ordinality.
-
Complex rack-aware pairing: when the replication factor is not a multiple of nr_racks. In this case, we attempt best-match pairing in all racks, using the minimum number of base or view replicas in each rack (given their global ordinality), while pairing all the other replicas, across racks, sorted by their ordinality.
For example, nr_racks=4, rf=3: base_replicas = { N00, N10, N20 } view_replicas = { N11, N21, N31 } pairing would be: { N00, N31 }*, { N10, N11 }, { N20, N21 } * cross-rack pair
If we'd simply stable-sort both base and view replicas by rack, we might end up with much worse pairing across racks: { N00, N11 }*, { N10, N21 }*, { N20, N31 }* * cross-rack pair
Fixes scylladb/scylladb#17147
- This is an improvement so no backport is required