Warning: Received unexpected block 30925 for superblock index (consolidated: 6185, current 6197). Delaying synchronization until next epoch.
When trying a testnet node, I found that the node cannot synchronize with the network and shows this warning. The node seems to enter an endless loop and never recovers.
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::chain_manager] State machine is transitioning from WaitingConsensus into Synchronizing
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::chain_manager::mining] Not mining because node is not in Synced state (current state is Synchronizing)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] A SendLastBeacon message is now being forwarded to a random session
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::session] [>] Sending LAST_BEACON: Block: #30691: 714a8be516acd34962e9f031811da58827fbe8234c147ecab82c8b98077dae94 Superblock: #5768: 14acafd93318622e63c8e355e168074c9a14084835a9acbaa3ead0cbd2ba82fa message to session 51.136.60.189:21309 (93 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::session] [<] Received INVENTORY_ANNOUNCEMENT message from session 51.136.60.189:21309 (9205 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::session] [>] Sending INVENTORY_REQUEST message to session 51.136.60.189:21309 (9205 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30692: a5e9857e01238cd7bb1db90462ba8b2fcdc492fdc49a7c4d1ec34dbf94058065 message (559 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30693: 3ff10bc26350ea4fa708fc0a5af921ae09600d8246ac8e2125ada3c94d9abde0 message (560 bytes)
... (more blocks)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30924: 94d34bb5f36b369341cface9f786427f9e370b3239652a2142359dd708450057 message (560 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30925: f1d1da40c76a0a6d3f94673030f8aebaa7e4191987f6ec3dd207eb0be87fd1e8 message (560 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30926: 85862c3447f3e04e79d215f25c2b0f2bca68cb4cd29893a6988b3ec36972b2d9 message (558 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30927: 47abbd4ac91311d870a54f871b5ad26ec555cba4302d0f061ebb7b829da0cab8 message (560 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30928: a4e8b58ae55eecdb6cbc52c8c0f98af533d8a3fba828e445d16ee230a05b6cae message (560 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30929: 8eb1824986e64c6c4bfcea9beb91f85910320aa85b0adcdf49ed7e7827d39c50 message (560 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30930: 1e9aa33b03b60f95b593caee9487d0da8c7f06ff0a909aa6c3f7b27c0a691c8f message (559 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30931: ba19681bc18d847fafd6effda31316da4b98f52872f843a252d67ae24d8647c6 message (559 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30932: e9263a316c43d243eb1c16d3c065b57527e5b6fd5b170362161930308dd3343a message (560 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received BLOCK: #30933: e19ad79b53ccb1f6676989e57e12187266dedabc42ad3b29baa78f1361335be8 message (559 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::sessions_manager::handlers] [<] Received SUPERBLOCK_VOTE twit1d8jdg64qu4ymyc5qvn20lzkd200g74n0n6355g #6186: 8d454a0b54f6a79ea84bbc6a4fbd0d496890dfde7f7cc2d788b67c63249f74bb message (202 bytes)
[2021-06-04T08:15:45Z DEBUG witnet_node::actors::chain_manager::handlers] AddBlocks received while StateMachine is in state Synchronizing
[2021-06-04T08:15:45Z WARN witnet_node::actors::chain_manager::handlers] Received unexpected block 30925 for superblock index (consolidated: 6185, current 6198). Delaying synchronization until next epoch.
We can use this test to reproduce the error:
// node/src/actors/chain_manager/handlers.rs
#[test]
fn test_split_blocks_batch_stuck() {
use BlockBatches::*;
let sync_target = SyncTarget {
block: CheckpointBeacon {
checkpoint: 30933,
hash_prev_block: "e19ad79b53ccb1f6676989e57e12187266dedabc42ad3b29baa78f1361335be8".parse().unwrap(),
},
superblock: CheckpointBeacon {
checkpoint: 6185,
hash_prev_block: "1763d075a93d169ca2085bd3eaa822a5ed120ad87ad4c5187c7a4e498a7ee9da".parse().unwrap(),
},
};
let superblock_period = 5;
let test_split_batch = |provided_blocks, epoch, sync_target: &SyncTarget| {
split_blocks_batch_at_target(
|x| *x,
provided_blocks,
epoch,
&sync_target.clone(),
superblock_period,
)
};
let blocks: Vec<Epoch> = (30692..=30933).collect();
assert_eq!(
test_split_batch(blocks, 31160, &sync_target),
(Err(ChainManagerError::WrongBlocksForSuperblock {
wrong_index: 30925,
consolidated_superblock_index: 6185,
current_superblock_index: 6232,
}))
);
}
It looks like the beacon sent by the other peer is impossible:
let sync_target = SyncTarget {
block: CheckpointBeacon {
checkpoint: 30933,
hash_prev_block: "e19ad79b53ccb1f6676989e57e12187266dedabc42ad3b29baa78f1361335be8".parse().unwrap(),
},
superblock: CheckpointBeacon {
checkpoint: 6185,
hash_prev_block: "1763d075a93d169ca2085bd3eaa822a5ed120ad87ad4c5187c7a4e498a7ee9da".parse().unwrap(),
},
};
Superblock 6185 confirms all the blocks up to 6185*5 - 1 = 30924. Additional blocks are accepted but only up to the next superblock, so only blocks up to 30929 are accepted. However the other peer says that the last block is 30933, which should not be possible because then the last superblock must be 6186, which confirms all the blocks up to 30929. The only case where that can happen is if the superblock 6186 has reverted, but then the blocks 30925-30929 cannot exist. This is why the error says that the block 30925 is wrong: because it shouldn't exist.
@tmpolaczyk So, are we running into some bug? To me, it smells like some sort of data race condition :thinking:
Looking into it, it looks like some nodes of the testnet were stuck on superblock 6184 and some other nodes were stuck on 6185, and for some unknown reason (probably related to synchronization) the blocks from the 6184-chain got into the 6185-chain, leading to this error.