besu icon indicating copy to clipboard operation
besu copied to clipboard

Avoid doing two validations on transactionsRoot and ReceiptsRoot during sync

Open ahamlat opened this issue 2 years ago • 11 comments
trafficstars

PR description

This PR enhances the Sync process by eliminating duplicate validations for transactions root and receipts root. We've noticed in CPU profiles during sync that receipts root and transactions root are calculated twice. The first time when data is retrieved from other peers, and the second time during block import when validating block body. This PR implements a mechanism to maintain a record of previously validated transactions and receipts roots, ensuring that redundant validations are avoided.

Sync time improvement (on a 4 vCPU/ 32 GiB RAM VM)

image

Without this PR Checkpoint sync time : 19 hours 54 minutes

With this PR Checkpoint sync time : 16 hours 42 minutes

CPU profiling

We can notice in the profiling screenshots that Besu does the receipts root validation only once with this PR when getting the receipts from the peers.

Before this PR

image

After this PR

image

ahamlat avatar Jul 06 '23 12:07 ahamlat

  • [ ] I thought about documentation and added the doc-change-required label to this PR if updates are required.
  • [ ] I thought about the changelog and included a changelog update if required.
  • [ ] If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

github-actions[bot] avatar Jul 06 '23 12:07 github-actions[bot]

The performance improvements is very good, and we definitely need to avoid doing the same work twice, but I want to propose a different implementation, that avoid the modification to the data structures, and specifically avoid making them mutable like for Block, and is also more future proof, since we can't assume that validation when reading from network will not change in future. My alternative proposal is to use a local cache with the validation results in the validator class, this approach does not imply where the validation have been done, and could be also easily applied also to other validation that we could do more than once. I am available to discuss and help if you think this alternative implementation is a viable option.

fab-10 avatar Jul 07 '23 17:07 fab-10

We can discuss another implementation. My point here is to avoid any overhead in terms of performances, I thought about keeping track of what couple header/receipts was already verified, but this means we need to do another comparaison on header hash (byte array comparaison) which is expensive.

ahamlat avatar Jul 07 '23 19:07 ahamlat

I think the second proposition (cache) seems to be interesting. Having all of the validation result in cache can simplify the code (we just have to modify the validator and not the other parts of code). and can also optimize if we ask several times the same block (not sure if we have this case but maybe in case of a retry)

matkt avatar Jul 10 '23 13:07 matkt

@ahamlat is this PR one you're going to come back to or is the outcome to go with a different approach?

macfarla avatar Sep 14 '23 02:09 macfarla

@macfarla We didn't decided yet. @fab-10 and @matkt are in favor to ship this PR as it is ?

ahamlat avatar Sep 21 '23 09:09 ahamlat

@ahamlat @fab-10 @matkt - can we agree on an implementation? We could ship this as-is? The improvements are already substantial. Or in favor of a small rework?

non-fungible-nelson avatar Dec 11 '23 22:12 non-fungible-nelson

Can we circle back to this PR and determine whether this is the approach we want to take?

garyschulte avatar Jan 16 '24 22:01 garyschulte

I think a cache based approach, like the one done for #6375 is better

fab-10 avatar Jan 17 '24 09:01 fab-10

I agree with @fab-10, let me think about how to include better caching mecanism

ahamlat avatar Jan 17 '24 17:01 ahamlat

@ahamlat is this still relevant/in progress?

siladu avatar Mar 01 '24 00:03 siladu