snarkOS
snarkOS copied to clipboard
[Bug] Bad transaction stuck in mempool
🐛 Bug Report
My node is observing miners on testnet2 keeping broadcasting blocks with a bad transaction:
2022-06-18T06:54:12.716313Z WARN Transaction at170yn5pde2v803l2y3q6dyyuumnfzfa9eruqzuxd7umlp6yjw5q9sfc24dw in block 806182 references non-existent ledger root al1gesxhq6vwwa2xx3uh8w5pcfsg7zh942c3tlqhdn5c8wt8lh5tvqqpu882x
One example of such block:
(not sure how to display it, but essentially beside the coinbase tx, there is another tx which is referencing to that ledger root)
The ledger root al1gesxhq6vwwa2xx3uh8w5pcfsg7zh942c3tlqhdn5c8wt8lh5tvqqpu882x is not in the canonical chain, so I guess the transaction was referencing to a non-canonical block, probably because of network forks.
My theory:
- One node received a transaction
- The transaction passed the validation at that time as the referenced ledger root is still in the canonical chain
- The transaction is broadcast to the network and being stored in the mempool of the nodes
- The block containing the referenced ledger root has become non-canonical
- Miners mine the block, try to add the new block to the canonical chain but fail to do so [1]
- Because the block has failed to be included into the canonical chain, the mempool is not cleared [2]
- Regardless, the new block is still broadcast to the network [3]
As a result, all miners which still has this transaction in the mempool would fail to produce new blocks. The network proof rate is indeed decreasing since I first observed this.
References: [1] https://github.com/AleoHQ/snarkOS/blob/95d8300c8415648d48d841995d42bb972fe9f871/storage/src/state/ledger.rs#L1023-L1031
[2] ("on success" but add_next_block failed) https://github.com/AleoHQ/snarkOS/blob/95d8300c8415648d48d841995d42bb972fe9f871/network/src/ledger.rs#L521-L530
[3] (add_block has failed but still propagates) https://github.com/AleoHQ/snarkOS/blob/95d8300c8415648d48d841995d42bb972fe9f871/network/src/ledger.rs#L241-L253
Steps to Reproduce
You will need to be able to generate a transaction like that so unfortunately I currently don't have a good way to reproduce it.
Expected Behavior
The block which fails the validation should not be broadcast to the network.
Invalid transactions should be detected and removed from the mempool.
Your Environment
snarkOS testnet3 branch