besu icon indicating copy to clipboard operation
besu copied to clipboard

Nullpointer during snapsync "because taskElement is null"

Open matkt opened this issue 2 years ago • 4 comments

Description

There is an issue when we receive data but the proof is invalid. We still try to retrieve the child tasks even though the stacktrie has not been modified due to the invalid proof. We should not only verify that we have retrieved data, but also check that the data is valid before trying to retrieve child request.

2023-11-01 18:39:01.848+00:00 | EthScheduler-Services-43 (importBlock) | INFO  | FastImportBlocksStep | Block import progress: 12158784 of 18477974 (65%)

2023-11-01 18:39:34.237+00:00 | EthScheduler-Services-13 (batchPersistLargeStorageData) | INFO  | Pipeline | Unexpected exception in pipeline. Aborting.

java.lang.NullPointerException: Cannot invoke "org.hyperledger.besu.ethereum.eth.sync.snapsync.StackTrie$TaskElement.proofs()" because "taskElement" is null

	at org.hyperledger.besu.ethereum.eth.sync.snapsync.request.StorageRangeDataRequest.getChildRequests(StorageRangeDataRequest.java:166)

	at org.hyperledger.besu.ethereum.eth.sync.snapsync.PersistDataStep.persist(PersistDataStep.java:51)

	at org.hyperledger.besu.ethereum.eth.sync.snapsync.SnapWorldStateDownloadProcess$Builder.lambda$build$13(SnapWorldStateDownloadProcess.java:314)

	at org.hyperledger.besu.services.pipeline.MapProcessor.processNextInput(MapProcessor.java:31)

	at org.hyperledger.besu.services.pipeline.ProcessingStage.run(ProcessingStage.java:38)

	at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:169)

	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)

	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)

	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

	at java.base/java.lang.Thread.run(Thread.java:833)

2023-11-01 18:39:45.599+00:00 | EthScheduler-Services-43 (importBlock) | INFO  | FastImportBlocksStep | Block import progress: 12159184 of 18477974 (65%)

matkt avatar Nov 02 '23 09:11 matkt

@matkt Does this halt the sync?

siladu avatar Nov 06 '23 22:11 siladu

It will stop the worldstate but not the blockchain. So with this kind of error we can have a 99% blockchain which does not stop sync unless we restart besu

matkt avatar Nov 13 '23 22:11 matkt

Hi @pullurib - in light of @matkt's comment on the already merged PR, we are considering reverting it.

https://github.com/hyperledger/besu/pull/7724#issuecomment-2407425940

It is not clear to me whether the PR introduces a new bug, but it sounds like it isn't a step towards resolving this issue.

Wanted to give you a chance to discuss with @matkt before we revert.

siladu avatar Oct 21 '24 22:10 siladu

Hi @siladu , thanks for pointing it out. I believe @matkt meant that the change doesn't fix the bug but doesn't introduce a new bug . I'll check on what else needs to be done on the PR .

pullurib avatar Oct 23 '24 15:10 pullurib

I think it's preferable to revert for now while we propose another fix. When I say it doesn't cause a bug, I mean that I believe the transition to this condition is rare and unlikely to occur often. However, if it does happen and we return empty, it could create a problem.

matkt avatar Oct 31 '24 08:10 matkt