HoneyBadgerBFT
HoneyBadgerBFT copied to clipboard
Bug in ABA protocol's use of Common Coin
Thanks to Ethan MacBrough (from Ripple Research) for reporting a protocol design error in HoneyBadgerBFT. The error has to do with the use of Threshold-Signature-based Common Coin to instantiate the Common Coin dependency of the ABA routine. MacBrough correctly points out that a flaw in the proof is that the ABA protocol we use relies on a stronger common coin than the one we instantiate (in fact it relies on a common coin that is so strong it cannot be implemented), and in fact provides a concrete attack. Fortunately, the ABA protocol can be readily modified to accommodate our weaker common coin, and hence fix the protocol. The fix suggested by MacBrough is posted at the end of the message.
To close this issue, we need to deploy the following fixes:
- [ ] This is a protocol design flaw, and the construction in the conference paper is currently incorrect. The protocol construction and proof must be updated.
- [x] The constructive attack proposed by MacBrough should be reflected in the test cases
- [x] the ABA implementation must be modified to reflect the new fix
Ethan's note is below:
I thought I should let you know that I found a (probably very minor in practice) issue that can make the binary agreement instantiation used in HoneyBadger potentially never terminate with a single Byzantine node when the network scheduler is actively adversarial. I realized the issue when I was working on Cobalt, wherein I fixed the issue by adding an extra message phase to each round. I just remembered that HoneyBadger uses the same binary consensus mechanism, so I should let you know about the issue.
The basic issue is that Moustefaoi et al.'s consensus algorithm requires a stronger common coin than the instantiation given by Cachin et al. can provide. Namely, they assume that the values of the common coin are completely independent of the state of EVERY honest node at the point where it queries the coin value. This is not guaranteed by Cachin's common coin, since the adversary can reconstruct the coin value after only some of the honest nodes have queried the coin, and then inject "bad state" into the other nodes.
Specifically, there's an attack that can be launched as follows. For the attack I assume the common coin has a threshold of at most 2f+1, where n=3f+1; From what I read, HoneyBadger uses a threshold of only f+1, so the attack still holds. I present the more general argument since 2f+1 is the maximum threshold that can be used (since otherwise f crashed nodes could block coin construction), so the attack shows that a simple modification of the threshold doesn't help.
Let S be the set of all nodes, with
|S|=3f+1
. PartitionS
into 4 subsets,A0
,A1
,B
, andF
; where|A0|=|A1|=|B|=f
, and|F|=1
. Also letA=A0\cup A1
. Suppose every node inA
starts out in roundr
votingv\in\{0,1\}
and every node inB
starts out voting\neg v
. The nodex\in F
is Byzantine.
x
sendsBVAL(\neg v)
to the nodes inA0
andBVAL(v)
to the nodes inA1
. Also, all votes from nodes inB
are delivered to all nodes inA.
Messages withinA0
are delivered. Thus nodes inA0
see|B|+|F|=f+1
votes for\neg v
; so all nodes inA0
broadcastBVAL(\neg v)
and all nodes inA0
see|A0|+|B|+|F|=2f+1
votes for\neg v
; so all nodes inA0
broadcastAUX(\neg v)
.
Then all messages within
A1
are delivered, as well as theBVAL(v)
messages fromA0
toA1
. Thus the nodes inA1
see|A0|+|A1|+|F|=2f+1
votes forv
and broadcastAUX(v)
. After this all messages withinA
are delivered andx
sends bothBVAL(0)
andBVAL(1)
to every node inA
. Thus every node inA
broadcasts bothBVAL(0)
andBVAL(1)
and setsbin_values=\{0,1\}
.
Now all nodes in
A
broadcast their threshold shares over the coin, so since|A|+|F|=2f+1
, the adversary can construct the random coin value s. The nodes inF
sendBVAL(\neg s)
to all the nodes inB
, and all theBVAL(\neg s)
messages from nodes inA
are delivered to all nodes inB
. Thus all the nodes inB
broadcastAUX(\neg s)
. Deliver allAUX(\neg s)
messages; there are2f+1
of them, since either every node inA0
broadcastAUX(\neg s)
or every node inA1
broadcastAUX(\neg s)
. Thus all nodes inB
see2f+1 AUX(\neg s)
messages and get to the end of the round withbin_values=\neg s
. Thus the nodes inB
continue to the next round voting\neg s
while the nodes inA
continue to the next round votings
. At this point all messages from the round are delivered, and the process repeats.
Ethan MacBrough suggests the following fix, which is also used in Cobalt
The exact fix I came up with is, after passing the
AUX
message round, everyone broadcastsCONF(bin_values)
and then waits until there aren-t
CONF(S)
messages withS\subseteq bin_values
. Clearly everyone can eventually get past this, just like theAUX
message round.
Now suppose some node
P_i
gets past theCONF
round withbin_values={v}
. Thent+1
honest nodes must have broadcastCONF({v})
. Thus for any other honest nodeP_j
that gets past theCONF
round,P_j
must have receivedCONF({v})
from some honest node before getting past (since it waits forn-t
CONF
messages). But two honest nodes cannot submitCONF({0})
andCONF({1})
by the security guarantees of theAUX
round, so this means the valuev
was determined beforeP_j
got past theCONF
step, and hence beforeP_j
revealed its threshold share. Thus the adversary learns nothing about the coin value until afterv
is determined, sov=s
with probability1/2
.
As an optimization, after we've received N - f CONF
messages, we should probably use the minimal value set as vals
of which we know that at least one honest node put it into a CONF
, to make termination as likely as possible?
I.e. if we sent CONF({v})
ourselves or we received f + 1 CONF({v})
, we set vals = {v}
. Otherwise vals = {v, w}
. Then if the coin is also v
, we can terminate. (Even if originally we had arrived at {v, w}
, and sent CONF({v, w})
.)
I wonder if it would be safe to use vals = {v}
even if neither of those are true, but if in the meantime we have collected N - f AUX(v)
messages (so now we wouldn't need the other AUX(!v)
messages anymore, and realize we could have sent CONF({v})
) — or would that reopen the doors to something like the original attack again?
Edit: …or would that actually make termination less likely, because we would be less likely to switch to the coin value, like possibly all the other good nodes?
If we can justify both {v}
and {v, w}
with the above arguments, should we use {v}
only if the coin is v
, and otherwise use {v, w}
(i.e. switch)?
Probably there is no right answer, and the best strategy depends on the number of faulty nodes, and the current distribution of estimates. Something like: If the coin is not v
, use whatever the majority of the CONF
messages says? (Never counting invalid CONF
s, of course, with values not in our own bin_values
.)
Edit 2: And would it be worth doing the CONF
round even in epochs with a fixed coin, just because it makes it more likely that we can terminate? It would increase the number of message rounds in the most optimistic case, but since ACS has to wait for all ABA instances, it is as slow as the slowest out of N ABAs.
If we go with the first rule — always use {v}
if you can —, then if we received N - f CONF({v})
messages, we know that at least f + 1 correct nodes sent CONF({v})
and therefore will use vals = {v}
and not switch to !v
, even if the coin is !v
. We therefore know that v
will end up in bin_values
in the next epoch eventually, and could send both BVAL(v)
and AUX(v)
right at the beginning of the next epoch, and put v
into bin_values
right away. If enough nodes do that, it would effectively remove one message round from the next epoch.
There is a bug in the CONF Phase implement https://github.com/initc3/HoneyBadgerBFT-Python/blob/e8bcbc081dfb5d1e7298039d47bbebf7048b8e62/honeybadgerbft/core/binaryagreement.py#L40
The message broadcast on line 40 should be values, which is the output of AUX Phase, instead of bin_values. Otherwise, the security guarantees of the AUX round would not take effect!