Robustness: DELA network breaks down after 256 rounds
after round 255 the following happens:
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z INF pkg/mod/go.dedis.ch/[email protected]/core/ordering/cosipbft/mod.go:387 > block event addr=dela-worker-0:2000 index=256 root=b9bf9506
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z DBG pkg/mod/go.dedis.ch/[email protected]/core/ordering/cosipbft/mod.go:551 > round has started addr=dela-worker-0:2000 index=257
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z DBG pkg/mod/go.dedis.ch/[email protected]/core/ordering/cosipbft/blocksync/default.go:230 > received synchronization message addr=dela-worker-0:2000 index=255
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/[email protected]/mino/minogrpc/session/mod.go:398 > relay failed to send error="client: rpc error: code = Canceled desc = context canceled" addr=dela-worker-0:2000
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/[email protected]/mino/minogrpc/session/mod.go:398 > relay failed to send error="client: rpc error: code = Canceled desc = context canceled" addr=dela-worker-0:2000
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/[email protected]/mino/minogrpc/session/mod.go:374 > failed to setup relay error="client: rpc error: code = Canceled desc = context canceled" addr=dela-worker-0:2000 to=dela-worker-2:2000
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z ERR pkg/mod/go.dedis.ch/[email protected]/mino/minogrpc/rpc.go:227 > stream to root failed error="rpc error: code = Unknown desc = handler failed to process: failed to verify chain: mismatch from: 'd6daf929' != '1894d9c4'"
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/[email protected]/mino/minogrpc/session/mod.go:389 > parent is closing error="client: rpc error: code = Canceled desc = context canceled" addr=Orchestrator:dela-worker-0:2000
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/[email protected]/core/ordering/cosipbft/blocksync/default.go:124 > announcement failed error="session Orchestrator:dela-worker-0:2000 is closing: Canceled" addr=dela-worker-0:2000
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z WRN pkg/mod/go.dedis.ch/[email protected]/mino/minogrpc/session/mod.go:374 > failed to setup relay error="client: rpc error: code = Canceled desc = context canceled" addr=dela-worker-0:2000 to=dela-worker-3:2000
and then no new transactions can be added to the block chain anymore (i.e. no new forms, votes, ...)
I do not understand DELA enough to guess at an answer, the only thing that seems suspicious to me is the following line in the logs:
d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z DBG pkg/mod/go.dedis.ch/[email protected]/core/ordering/cosipbft/mod.go:551 > round has started addr=dela-worker-0:2000 index=257 d-voting-dela-worker-0-1 | 2023-09-06T13:09:13Z DBG pkg/mod/go.dedis.ch/[email protected]/core/ordering/cosipbft/blocksync/default.go:230 > received synchronization message addr=dela-worker-0:2000 index=255
when before that the index has always been increasing
To reproduce: on a clean install, create a form and add votes up until 256
Congratulation for daring to go beyond 255 blocks, you found a pretty nasty bug 😁.
The symptom that we see is that node-0 rejects the chain it receives from itself during the periodic sync that happens among nodes. When nodes receive a sync message, they first validate the chain of links present in the sync message by checking that each forward and backward links correspond.
The chain is stored in a key-value store, where each key is the index of the block. When we create the chain of links (which is a lighter version of the blockchain) for validation, we iterate over all keys from the key-value store in a sorted order by key, which should naturally provide blocks from the genesis to the latest one. (genesis block has index 0, next block 1, etc...) We are using bbolt for the key-value store, which states that "Bolt stores its keys in byte-sorted order within a bucket".
When we store a block, we compute the key that we use for the key-value store with this function: https://github.com/dedis/dela/blob/4bcfa7981c828b150f7de448a7457aff80e5736e/core/ordering/cosipbft/blockstore/disk.go#L307
func (s *InDisk) makeKey(index uint64) []byte {
key := make([]byte, 8)
binary.LittleEndian.PutUint64(key, index)
return key
}
Can you see why there is a problem after 255 ?
so what you are saying is that when they index hits 256 and therefore the "next byte" and the lowest byte becomes 0, Bolt is not correctly interpreting the position of the key in the ordering anymore since we are using LitteEndian
so changing it to BigEndian should solve the problem?
(moved this to Dela is the problem is clearly on this side)
so changing it to BigEndian should solve the problem?
yes :)
Is included in c4dt/dela