besu
besu copied to clipboard
Nullpointer during snapsync "because taskElement is null"
Description
There is an issue when we receive data but the proof is invalid. We still try to retrieve the child tasks even though the stacktrie has not been modified due to the invalid proof. We should not only verify that we have retrieved data, but also check that the data is valid before trying to retrieve child request.
2023-11-01 18:39:01.848+00:00 | EthScheduler-Services-43 (importBlock) | INFO | FastImportBlocksStep | Block import progress: 12158784 of 18477974 (65%)
2023-11-01 18:39:34.237+00:00 | EthScheduler-Services-13 (batchPersistLargeStorageData) | INFO | Pipeline | Unexpected exception in pipeline. Aborting.
java.lang.NullPointerException: Cannot invoke "org.hyperledger.besu.ethereum.eth.sync.snapsync.StackTrie$TaskElement.proofs()" because "taskElement" is null
at org.hyperledger.besu.ethereum.eth.sync.snapsync.request.StorageRangeDataRequest.getChildRequests(StorageRangeDataRequest.java:166)
at org.hyperledger.besu.ethereum.eth.sync.snapsync.PersistDataStep.persist(PersistDataStep.java:51)
at org.hyperledger.besu.ethereum.eth.sync.snapsync.SnapWorldStateDownloadProcess$Builder.lambda$build$13(SnapWorldStateDownloadProcess.java:314)
at org.hyperledger.besu.services.pipeline.MapProcessor.processNextInput(MapProcessor.java:31)
at org.hyperledger.besu.services.pipeline.ProcessingStage.run(ProcessingStage.java:38)
at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:169)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
2023-11-01 18:39:45.599+00:00 | EthScheduler-Services-43 (importBlock) | INFO | FastImportBlocksStep | Block import progress: 12159184 of 18477974 (65%)
@matkt Does this halt the sync?
It will stop the worldstate but not the blockchain. So with this kind of error we can have a 99% blockchain which does not stop sync unless we restart besu
Hi @pullurib - in light of @matkt's comment on the already merged PR, we are considering reverting it.
https://github.com/hyperledger/besu/pull/7724#issuecomment-2407425940
It is not clear to me whether the PR introduces a new bug, but it sounds like it isn't a step towards resolving this issue.
Wanted to give you a chance to discuss with @matkt before we revert.
Hi @siladu , thanks for pointing it out. I believe @matkt meant that the change doesn't fix the bug but doesn't introduce a new bug . I'll check on what else needs to be done on the PR .
I think it's preferable to revert for now while we propose another fix. When I say it doesn't cause a bug, I mean that I believe the transition to this condition is rare and unlikely to occur often. However, if it does happen and we return empty, it could create a problem.