snarkOS [Fix] Correct quorum threshold calculation in test function is_round

Motivation

This PR updates the quorum threshold calculation in the is_round_reached function to ensure correct Byzantine fault tolerance. The current implementation uses a simple majority calculation, which is insufficient for Byzantine fault tolerance in certain cases.

Current implementation:

let quorum_threshold = self.validators.len() / 2 + 1;

Proposed implementation:

let quorum_threshold = self.validators.len() - (self.validators.len() - 1) / 3;

The new calculation is based on the principle that in a Byzantine fault-tolerant system with N validators, we must be able to tolerate f faulty validators, where N = 3f + 1 + k (0 <= k < 3). The correct quorum threshold is N - f, which our new calculation achieves.

Quorum Threshold Comparison

Comparison between current implementation (n / 2 + 1) and proposed implementation (n - (n - 1) / 3) for various ranges of validator counts (n).

Range: 1 <= n < 10

n	Current (n/2 + 1)	Proposed (n - (n-1)/3)	Correct?
1	1	1	Yes
2	2	2	Yes
3	2	3	No
4	3	3	Yes
5	3	4	No
6	4	5	No
7	4	5	No
8	5	6	No
9	5	7	No

Range: 99 < n < 110

n	Current (n/2 + 1)	Proposed (n - (n-1)/3)	Correct?
100	51	67	No
101	51	68	No
102	52	68	No
103	52	69	No
104	53	70	No
105	53	70	No
106	54	71	No
107	54	72	No
108	55	72	No
109	55	73	No

Range: 999 < n < 1010

n	Current (n/2 + 1)	Proposed (n - (n-1)/3)	Correct?
1000	501	667	No
1001	501	668	No
1002	502	668	No
1003	502	669	No
1004	503	670	No
1005	503	670	No
1006	504	671	No
1007	504	672	No
1008	505	672	No
1009	505	673	No

Key Observations:

For n < 4, both implementations give the same result, but the current implementation is correct only by coincidence.
Starting from n = 3, the proposed implementation consistently requires a higher quorum threshold, which is necessary for Byzantine fault tolerance.
As n increases, the difference between the two implementations becomes more pronounced. The current implementation significantly underestimates the required quorum for larger validator sets.
The current implementation always returns a value close to 50% of n, which is insufficient for Byzantine fault tolerance.
The proposed implementation correctly calculates the Byzantine fault-tolerant quorum (approximately 2/3 of n) for all values of n.

This comparison demonstrates that the proposed implementation provides the correct quorum threshold for Byzantine fault tolerance across all ranges of validator counts, while the current implementation fails to do so in almost all cases where n > 2.

Aug 24 '24 14:08 ungaro

would like to get a review, thanks. @bendyarm @d0cd

Aug 24 '24 14:08 ungaro

This may be trivial as the changes only affect the assumptions of a testing only TestNetwork. @niklaslong Wanted to double check the intention of 2f+1 here.

Jan 24 '25 22:01 raychu86

I don't know the details of this particular testing code, but I confirm, with formal verification certainty, that n-f is the general correct quorum number. 2f+1 only works if n = 3f+1, but it fails (with easy counterexamples) if n = 3f+2 or n = 3f+3. Here f is defined the maximum one that satisfies f < n/3.

Jan 25 '25 01:01 acoglio

@acoglio @raychu86 @niklaslong For more background see my notes on the issue #3377 . I did not fix this one at the time I did PR#2526 because it wasn't clear to me what we were using the test for and there were multiple problems with the test code.

Jan 25 '25 03:01 bendyarm

This is indeed a mistake, thanks for correcting it!

As for why, I'm fairly certain this was leftover from debugging flaky tests with larger numbers of validators—the asynchronous nature of the tests means they still sometimes fail with the corrected bound on debug builds. We also use N = 4 in tests, so thankfully the bound is equivalent there.

Jan 28 '25 16:01 niklaslong

n	Current (n/2 + 1)	Proposed (n - (n-1)/3)	Correct?
100	51	67	No
101	51	68	No
102	52	68	No
103	52	69	No
104	53	70	No
105	53	70	No
106	54	71	No
107	54	72	No
108	55	72	No
109	55	73	No

n	Current (n/2 + 1)	Proposed (n - (n-1)/3)	Correct?
100	51	67	No
101	51	68	No
102	52	68	No
103	52	69	No
104	53	70	No
105	53	70	No
106	54	71	No
107	54	72	No
108	55	72	No
109	55	73	No

[Fix] Correct quorum threshold calculation in test function is_round_reached()

Motivation

Current implementation:

Proposed implementation:

Quorum Threshold Comparison

Range: 1 <= n < 10

Range: 99 < n < 110

Range: 999 < n < 1010

Key Observations:

n	Current (n/2 + 1)	Proposed (n - (n-1)/3)	Correct?
100	51	67	No
101	51	68	No
102	52	68	No
103	52	69	No
104	53	70	No
105	53	70	No
106	54	71	No
107	54	72	No
108	55	72	No
109	55	73	No