besu icon indicating copy to clipboard operation
besu copied to clipboard

Bonsai: SnapSync/FullSync and BlockPropagation race error

Open garyschulte opened this issue 1 year ago • 4 comments

Description

On Bonsai on non-PoS networks, when completing snap sync a race exists that can corrupt the state.

When the snapsync pipeline completes, the full sync process starts to catch up to head. If the BlockPropagation manager reacts to a gossip'd block during this time, trie log rolling fails due to a missing trielog:

Acceptance Criteria

  • [Criteria 1]

Steps to Reproduce (Bug)

  1. snap sync a non-PoS network using Bonsai storage format
  2. once the snap sync completes, especially on networks with a short block time, the block propagation manager attempts to import blocks ahead of the trielog for the current pivot being available.

Expected behavior: [What you expect to happen] no issue post sync

Actual behavior: [What actually happens] trie log rolling fails at the tip of the synced chain due to missing trielog

Frequency: [What percentage of the time does it occur?] varies based on network and block time. Testing locally with clique and 5 second block time this happens about 33% of the time

Logs (if a bug)

2023-10-04 18:22:30.070-07:00 | EthScheduler-Services-42 (importBlock) | INFO  | SnapWorldDownloadState | Running world state heal process from peers with pivot block 17083
2023-10-04 18:22:30.073-07:00 | EthScheduler-Services-36 (requestCompleteTask) | INFO  | SnapWorldDownloadState | Initiating the healing process for the flat database
2023-10-04 18:22:30.078-07:00 | EthScheduler-Services-36 (requestCompleteTask) | INFO  | SnapsyncMetricsManager | Finished worldstate snapsync with nodes 15 (healed=0) duration 0:2,418.
2023-10-04 18:22:31.009-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17095 (0xba970ed3247b9f2faefa462e0f6bbac711801158f19b56c8fa43c18c9b79033c) - 1 saved block(s)
2023-10-04 18:22:31.010-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0xf9bc52fdc7ee314f5ea3e955f95319263a8fe330da466dd422dac60b1855eba3 of block 17095 (0xba970ed3247b9f2faefa462e0f6bbac711801158f19b56c8fa43c18c9b79033c)
2023-10-04 18:22:31.021-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17094 (0xf9bc52fdc7ee314f5ea3e955f95319263a8fe330da466dd422dac60b1855eba3) - 2 saved block(s)
2023-10-04 18:22:31.021-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x88771787c2d41cadac5ec845badd71da6d6632d03544798ee15a70966c826f0e of block 17094 (0xf9bc52fdc7ee314f5ea3e955f95319263a8fe330da466dd422dac60b1855eba3)
2023-10-04 18:22:31.023-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17093 (0x88771787c2d41cadac5ec845badd71da6d6632d03544798ee15a70966c826f0e) - 3 saved block(s)
2023-10-04 18:22:31.023-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x9d8962b55b4f2ff02e69558e8ef52b3f69dfd79533075b3b888417a84f1d1cb9 of block 17093 (0x88771787c2d41cadac5ec845badd71da6d6632d03544798ee15a70966c826f0e)
2023-10-04 18:22:31.024-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17092 (0x9d8962b55b4f2ff02e69558e8ef52b3f69dfd79533075b3b888417a84f1d1cb9) - 4 saved block(s)
2023-10-04 18:22:31.025-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x171e9e81cd65b16e0dbdd2e866aa72d5a670d8ec8532e8db184f43ade9efdcc9 of block 17092 (0x9d8962b55b4f2ff02e69558e8ef52b3f69dfd79533075b3b888417a84f1d1cb9)
2023-10-04 18:22:31.026-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17091 (0x171e9e81cd65b16e0dbdd2e866aa72d5a670d8ec8532e8db184f43ade9efdcc9) - 5 saved block(s)
2023-10-04 18:22:31.026-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x25bd079593c700b80250e05eace2556c89fd3c45bfd407a40b4b6c92c3230752 of block 17091 (0x171e9e81cd65b16e0dbdd2e866aa72d5a670d8ec8532e8db184f43ade9efdcc9)
2023-10-04 18:22:31.028-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17090 (0x25bd079593c700b80250e05eace2556c89fd3c45bfd407a40b4b6c92c3230752) - 6 saved block(s)
2023-10-04 18:22:31.028-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x38b65af81633f8b029ce35ee48ef7a2eddbd0e5bf55449196c7c449f6cdc533d of block 17090 (0x25bd079593c700b80250e05eace2556c89fd3c45bfd407a40b4b6c92c3230752)
2023-10-04 18:22:31.030-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17089 (0x38b65af81633f8b029ce35ee48ef7a2eddbd0e5bf55449196c7c449f6cdc533d) - 7 saved block(s)
2023-10-04 18:22:31.030-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0xdb3ccc878c8f3653ee3c9f897bb585d75e97c007fb8ccad8a07d04c45cb3a036 of block 17089 (0x38b65af81633f8b029ce35ee48ef7a2eddbd0e5bf55449196c7c449f6cdc533d)
2023-10-04 18:22:31.032-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17088 (0xdb3ccc878c8f3653ee3c9f897bb585d75e97c007fb8ccad8a07d04c45cb3a036) - 8 saved block(s)
2023-10-04 18:22:31.033-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x97f4f5357046fdc0e170f96f0ad1c137e3c093e59f3588ebdf0dedf4e252ffa8 of block 17088 (0xdb3ccc878c8f3653ee3c9f897bb585d75e97c007fb8ccad8a07d04c45cb3a036)
2023-10-04 18:22:31.035-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17087 (0x97f4f5357046fdc0e170f96f0ad1c137e3c093e59f3588ebdf0dedf4e252ffa8) - 9 saved block(s)
2023-10-04 18:22:31.035-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x92954f4e479d37c0711169ece926fdabd3f3bf66c8a407a57b9c2ede1de0d1df of block 17087 (0x97f4f5357046fdc0e170f96f0ad1c137e3c093e59f3588ebdf0dedf4e252ffa8)
2023-10-04 18:22:31.038-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17086 (0x92954f4e479d37c0711169ece926fdabd3f3bf66c8a407a57b9c2ede1de0d1df) - 10 saved block(s)
2023-10-04 18:22:31.038-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x2cdc4f1edd62d3d8e70db61f1ba3bae2b00530d0895cde423487ef05a0d9fb49 of block 17086 (0x92954f4e479d37c0711169ece926fdabd3f3bf66c8a407a57b9c2ede1de0d1df)
2023-10-04 18:22:31.041-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Saved announced block for future import 17085 (0x2cdc4f1edd62d3d8e70db61f1ba3bae2b00530d0895cde423487ef05a0d9fb49) - 11 saved block(s)
2023-10-04 18:22:31.041-07:00 | nioEventLoopGroup-3-2 | INFO  | BlockPropagationManager | Retrieving parent 0x49454ea5798a66d744663d789c709f31516eda3c19001c72a556ea11e0103c6f of block 17085 (0x2cdc4f1edd62d3d8e70db61f1ba3bae2b00530d0895cde423487ef05a0d9fb49)
2023-10-04 18:22:31.046-07:00 | EthScheduler-Workers-0 | INFO  | BonsaiWorldStateProvider | rolling mutableState blockHash 0x28b1d6e36173a75b905391390022f13eafff0522cd2909dbdc563cb36c2ecd26 to blockhash 0x6755091b4bd81ef4994b1b6256ceb01a3fccce8f1626baffcdfb54569f113f3c
2023-10-04 18:22:31.046-07:00 | EthScheduler-Workers-0 | INFO  | BonsaiWorldStateProvider | Rollforward 0x6755091b4bd81ef4994b1b6256ceb01a3fccce8f1626baffcdfb54569f113f3c
2023-10-04 18:22:31.046-07:00 | EthScheduler-Workers-0 | INFO  | BonsaiWorldStateProvider | Archive rolling failed for block hash 0x6755091b4bd81ef4994b1b6256ceb01a3fccce8f1626baffcdfb54569f113f3c
java.util.NoSuchElementException: No value present
	at java.base/java.util.Optional.get(Optional.java:143)
	at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateProvider.rollMutableStateToBlockHash(BonsaiWorldStateProvider.java:218)
	at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateProvider.getMutable(BonsaiWorldStateProvider.java:184)
	at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateProvider.getMutable(BonsaiWorldStateProvider.java:160)
	at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:124)
	at org.hyperledger.besu.ethereum.MainnetBlockValidator.validateAndProcessBlock(MainnetBlockValidator.java:72)
	at org.hyperledger.besu.ethereum.mainnet.MainnetBlockImporter.importBlock(MainnetBlockImporter.java:45)
	at org.hyperledger.besu.ethereum.core.BlockImporter.importBlock(BlockImporter.java:45)
	at org.hyperledger.besu.ethereum.eth.sync.tasks.PersistBlockTask.executeTask(PersistBlockTask.java:203)
	at org.hyperledger.besu.ethereum.eth.manager.task.AbstractEthTask.executeTaskTimed(AbstractEthTask.java:150)
	at org.hyperledger.besu.ethereum.eth.manager.task.AbstractEthTask.run(AbstractEthTask.java:75)
	at org.hyperledger.besu.ethereum.eth.sync.BlockPropagationManager.runImportTask(BlockPropagationManager.java:692)
	at org.hyperledger.besu.ethereum.eth.sync.BlockPropagationManager.validateAndProcessPendingBlock(BlockPropagationManager.java:671)
	at org.hyperledger.besu.ethereum.eth.sync.BlockPropagationManager.lambda$importOrSavePendingBlock$21(BlockPropagationManager.java:617)
	at org.hyperledger.besu.util.FutureUtils.propagateResult(FutureUtils.java:98)
	at org.hyperledger.besu.ethereum.eth.manager.EthScheduler.lambda$scheduleSyncWorkerTask$0(EthScheduler.java:110)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
2023-10-04 18:22:31.048-07:00 | EthScheduler-Workers-0 | INFO  | DefaultSynchronizer | Stopping synchronizer
2023-10-04 18:22:31.048-07:00 | EthScheduler-Workers-0 | INFO  | FastSyncDownloader | Stopping fast sync
2023-10-04 18:22:31.056-07:00 | EthScheduler-Workers-0 | INFO  | DefaultSynchronizer |
####################################################################################################
#                                                                                                  #
# Besu has identified a problem with its worldstate database.                                      #
# Your node will fetch the correct data from peers to repair the problem.                          #
# Starting the sync pipeline...                                                                    #
#                                                                                                  #
####################################################################################################
2023-10-04 18:22:31.057-07:00 | EthScheduler-Workers-0 | INFO  | TransactionPoolFactory | Disabling transaction handling during re-sync

Versions (Add all that apply)

  • Software version: [besu --version]
  • Java version: [java -version]
  • OS Name & Version: [cat /etc/*release]
  • Kernel Version: [uname -a]
  • Virtual Machine software & version: [vmware -v]
  • Docker Version: [docker version]
  • Cloud VM, type, size: [Amazon Web Services I3-large]
  • Consensus Client & Version if using Proof of Stake: [e.g. Teku, Lighthouse, Prysm, Nimbus, Lodestar]

Smart contract information (If you're reporting an issue arising from deploying or calling a smart contract, please supply related information)

  • Solidity version [solc --version]
  • Repo with minimal set of deployable/reproducible contract code - please provide a link
  • Please include specifics on how you are deploying/calling the contract
  • Have you reproduced the issue on other eth clients

Additional Information (Add any of the following or anything else that may be relevant)

  • Besu setup info - genesis file, config options
  • System info - memory, CPU

garyschulte avatar Oct 05 '23 01:10 garyschulte