besu
besu copied to clipboard
Bonsai: SnapSync/FullSync and BlockPropagation race error
Description
On Bonsai on non-PoS networks, when completing snap sync a race exists that can corrupt the state.
When the snapsync pipeline completes, the full sync process starts to catch up to head. If the BlockPropagation manager reacts to a gossip'd block during this time, trie log rolling fails due to a missing trielog:
Acceptance Criteria
- [Criteria 1]
Steps to Reproduce (Bug)
- snap sync a non-PoS network using Bonsai storage format
- once the snap sync completes, especially on networks with a short block time, the block propagation manager attempts to import blocks ahead of the trielog for the current pivot being available.
Expected behavior: [What you expect to happen] no issue post sync
Actual behavior: [What actually happens] trie log rolling fails at the tip of the synced chain due to missing trielog
Frequency: [What percentage of the time does it occur?] varies based on network and block time. Testing locally with clique and 5 second block time this happens about 33% of the time
Logs (if a bug)
2023-10-04 18:22:30.070-07:00 | EthScheduler-Services-42 (importBlock) | INFO | SnapWorldDownloadState | Running world state heal process from peers with pivot block 17083
2023-10-04 18:22:30.073-07:00 | EthScheduler-Services-36 (requestCompleteTask) | INFO | SnapWorldDownloadState | Initiating the healing process for the flat database
2023-10-04 18:22:30.078-07:00 | EthScheduler-Services-36 (requestCompleteTask) | INFO | SnapsyncMetricsManager | Finished worldstate snapsync with nodes 15 (healed=0) duration 0:2,418.
2023-10-04 18:22:31.009-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17095 (0xba970ed3247b9f2faefa462e0f6bbac711801158f19b56c8fa43c18c9b79033c) - 1 saved block(s)
2023-10-04 18:22:31.010-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0xf9bc52fdc7ee314f5ea3e955f95319263a8fe330da466dd422dac60b1855eba3 of block 17095 (0xba970ed3247b9f2faefa462e0f6bbac711801158f19b56c8fa43c18c9b79033c)
2023-10-04 18:22:31.021-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17094 (0xf9bc52fdc7ee314f5ea3e955f95319263a8fe330da466dd422dac60b1855eba3) - 2 saved block(s)
2023-10-04 18:22:31.021-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x88771787c2d41cadac5ec845badd71da6d6632d03544798ee15a70966c826f0e of block 17094 (0xf9bc52fdc7ee314f5ea3e955f95319263a8fe330da466dd422dac60b1855eba3)
2023-10-04 18:22:31.023-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17093 (0x88771787c2d41cadac5ec845badd71da6d6632d03544798ee15a70966c826f0e) - 3 saved block(s)
2023-10-04 18:22:31.023-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x9d8962b55b4f2ff02e69558e8ef52b3f69dfd79533075b3b888417a84f1d1cb9 of block 17093 (0x88771787c2d41cadac5ec845badd71da6d6632d03544798ee15a70966c826f0e)
2023-10-04 18:22:31.024-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17092 (0x9d8962b55b4f2ff02e69558e8ef52b3f69dfd79533075b3b888417a84f1d1cb9) - 4 saved block(s)
2023-10-04 18:22:31.025-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x171e9e81cd65b16e0dbdd2e866aa72d5a670d8ec8532e8db184f43ade9efdcc9 of block 17092 (0x9d8962b55b4f2ff02e69558e8ef52b3f69dfd79533075b3b888417a84f1d1cb9)
2023-10-04 18:22:31.026-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17091 (0x171e9e81cd65b16e0dbdd2e866aa72d5a670d8ec8532e8db184f43ade9efdcc9) - 5 saved block(s)
2023-10-04 18:22:31.026-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x25bd079593c700b80250e05eace2556c89fd3c45bfd407a40b4b6c92c3230752 of block 17091 (0x171e9e81cd65b16e0dbdd2e866aa72d5a670d8ec8532e8db184f43ade9efdcc9)
2023-10-04 18:22:31.028-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17090 (0x25bd079593c700b80250e05eace2556c89fd3c45bfd407a40b4b6c92c3230752) - 6 saved block(s)
2023-10-04 18:22:31.028-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x38b65af81633f8b029ce35ee48ef7a2eddbd0e5bf55449196c7c449f6cdc533d of block 17090 (0x25bd079593c700b80250e05eace2556c89fd3c45bfd407a40b4b6c92c3230752)
2023-10-04 18:22:31.030-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17089 (0x38b65af81633f8b029ce35ee48ef7a2eddbd0e5bf55449196c7c449f6cdc533d) - 7 saved block(s)
2023-10-04 18:22:31.030-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0xdb3ccc878c8f3653ee3c9f897bb585d75e97c007fb8ccad8a07d04c45cb3a036 of block 17089 (0x38b65af81633f8b029ce35ee48ef7a2eddbd0e5bf55449196c7c449f6cdc533d)
2023-10-04 18:22:31.032-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17088 (0xdb3ccc878c8f3653ee3c9f897bb585d75e97c007fb8ccad8a07d04c45cb3a036) - 8 saved block(s)
2023-10-04 18:22:31.033-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x97f4f5357046fdc0e170f96f0ad1c137e3c093e59f3588ebdf0dedf4e252ffa8 of block 17088 (0xdb3ccc878c8f3653ee3c9f897bb585d75e97c007fb8ccad8a07d04c45cb3a036)
2023-10-04 18:22:31.035-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17087 (0x97f4f5357046fdc0e170f96f0ad1c137e3c093e59f3588ebdf0dedf4e252ffa8) - 9 saved block(s)
2023-10-04 18:22:31.035-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x92954f4e479d37c0711169ece926fdabd3f3bf66c8a407a57b9c2ede1de0d1df of block 17087 (0x97f4f5357046fdc0e170f96f0ad1c137e3c093e59f3588ebdf0dedf4e252ffa8)
2023-10-04 18:22:31.038-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17086 (0x92954f4e479d37c0711169ece926fdabd3f3bf66c8a407a57b9c2ede1de0d1df) - 10 saved block(s)
2023-10-04 18:22:31.038-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x2cdc4f1edd62d3d8e70db61f1ba3bae2b00530d0895cde423487ef05a0d9fb49 of block 17086 (0x92954f4e479d37c0711169ece926fdabd3f3bf66c8a407a57b9c2ede1de0d1df)
2023-10-04 18:22:31.041-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Saved announced block for future import 17085 (0x2cdc4f1edd62d3d8e70db61f1ba3bae2b00530d0895cde423487ef05a0d9fb49) - 11 saved block(s)
2023-10-04 18:22:31.041-07:00 | nioEventLoopGroup-3-2 | INFO | BlockPropagationManager | Retrieving parent 0x49454ea5798a66d744663d789c709f31516eda3c19001c72a556ea11e0103c6f of block 17085 (0x2cdc4f1edd62d3d8e70db61f1ba3bae2b00530d0895cde423487ef05a0d9fb49)
2023-10-04 18:22:31.046-07:00 | EthScheduler-Workers-0 | INFO | BonsaiWorldStateProvider | rolling mutableState blockHash 0x28b1d6e36173a75b905391390022f13eafff0522cd2909dbdc563cb36c2ecd26 to blockhash 0x6755091b4bd81ef4994b1b6256ceb01a3fccce8f1626baffcdfb54569f113f3c
2023-10-04 18:22:31.046-07:00 | EthScheduler-Workers-0 | INFO | BonsaiWorldStateProvider | Rollforward 0x6755091b4bd81ef4994b1b6256ceb01a3fccce8f1626baffcdfb54569f113f3c
2023-10-04 18:22:31.046-07:00 | EthScheduler-Workers-0 | INFO | BonsaiWorldStateProvider | Archive rolling failed for block hash 0x6755091b4bd81ef4994b1b6256ceb01a3fccce8f1626baffcdfb54569f113f3c
java.util.NoSuchElementException: No value present
at java.base/java.util.Optional.get(Optional.java:143)
at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateProvider.rollMutableStateToBlockHash(BonsaiWorldStateProvider.java:218)
at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateProvider.getMutable(BonsaiWorldStateProvider.java:184)
at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateProvider.getMutable(BonsaiWorldStateProvider.java:160)
at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:124)
at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:72)
at org.hyperledger.besu.ethereum.mainnet.MainnetBlockImporter.importBlock(MainnetBlockImporter.java:45)
at org.hyperledger.besu.ethereum.core.BlockImporter.importBlock(BlockImporter.java:45)
at org.hyperledger.besu.ethereum.eth.sync.tasks.PersistBlockTask.executeTask(PersistBlockTask.java:203)
at org.hyperledger.besu.ethereum.eth.manager.task.AbstractEthTask.executeTaskTimed(AbstractEthTask.java:150)
at org.hyperledger.besu.ethereum.eth.manager.task.AbstractEthTask.run(AbstractEthTask.java:75)
at org.hyperledger.besu.ethereum.eth.sync.BlockPropagationManager.runImportTask(BlockPropagationManager.java:692)
at org.hyperledger.besu.ethereum.eth.sync.BlockPropagationManager.validateAndProcessPendingBlock(BlockPropagationManager.java:671)
at org.hyperledger.besu.ethereum.eth.sync.BlockPropagationManager.lambda$importOrSavePendingBlock$21(BlockPropagationManager.java:617)
at org.hyperledger.besu.util.FutureUtils.propagateResult(FutureUtils.java:98)
at org.hyperledger.besu.ethereum.eth.manager.EthScheduler.lambda$scheduleSyncWorkerTask$0(EthScheduler.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
2023-10-04 18:22:31.048-07:00 | EthScheduler-Workers-0 | INFO | DefaultSynchronizer | Stopping synchronizer
2023-10-04 18:22:31.048-07:00 | EthScheduler-Workers-0 | INFO | FastSyncDownloader | Stopping fast sync
2023-10-04 18:22:31.056-07:00 | EthScheduler-Workers-0 | INFO | DefaultSynchronizer |
####################################################################################################
# #
# Besu has identified a problem with its worldstate database. #
# Your node will fetch the correct data from peers to repair the problem. #
# Starting the sync pipeline... #
# #
####################################################################################################
2023-10-04 18:22:31.057-07:00 | EthScheduler-Workers-0 | INFO | TransactionPoolFactory | Disabling transaction handling during re-sync
Versions (Add all that apply)
- Software version: [
besu --version
] - Java version: [
java -version
] - OS Name & Version: [
cat /etc/*release
] - Kernel Version: [
uname -a
] - Virtual Machine software & version: [
vmware -v
] - Docker Version: [
docker version
] - Cloud VM, type, size: [Amazon Web Services I3-large]
- Consensus Client & Version if using Proof of Stake: [e.g. Teku, Lighthouse, Prysm, Nimbus, Lodestar]
Smart contract information (If you're reporting an issue arising from deploying or calling a smart contract, please supply related information)
- Solidity version [
solc --version
] - Repo with minimal set of deployable/reproducible contract code - please provide a link
- Please include specifics on how you are deploying/calling the contract
- Have you reproduced the issue on other eth clients
Additional Information (Add any of the following or anything else that may be relevant)
- Besu setup info - genesis file, config options
- System info - memory, CPU