besu
besu copied to clipboard
Avoid doing two validations on transactionsRoot and ReceiptsRoot during sync
PR description
This PR enhances the Sync process by eliminating duplicate validations for transactions root and receipts root. We've noticed in CPU profiles during sync that receipts root and transactions root are calculated twice. The first time when data is retrieved from other peers, and the second time during block import when validating block body. This PR implements a mechanism to maintain a record of previously validated transactions and receipts roots, ensuring that redundant validations are avoided.
Sync time improvement (on a 4 vCPU/ 32 GiB RAM VM)
Without this PR Checkpoint sync time : 19 hours 54 minutes
With this PR Checkpoint sync time : 16 hours 42 minutes
CPU profiling
We can notice in the profiling screenshots that Besu does the receipts root validation only once with this PR when getting the receipts from the peers.
Before this PR
After this PR
- [ ] I thought about documentation and added the
doc-change-requiredlabel to this PR if updates are required. - [ ] I thought about the changelog and included a changelog update if required.
- [ ] If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests
The performance improvements is very good, and we definitely need to avoid doing the same work twice, but I want to propose a different implementation, that avoid the modification to the data structures, and specifically avoid making them mutable like for Block, and is also more future proof, since we can't assume that validation when reading from network will not change in future.
My alternative proposal is to use a local cache with the validation results in the validator class, this approach does not imply where the validation have been done, and could be also easily applied also to other validation that we could do more than once.
I am available to discuss and help if you think this alternative implementation is a viable option.
We can discuss another implementation. My point here is to avoid any overhead in terms of performances, I thought about keeping track of what couple header/receipts was already verified, but this means we need to do another comparaison on header hash (byte array comparaison) which is expensive.
I think the second proposition (cache) seems to be interesting. Having all of the validation result in cache can simplify the code (we just have to modify the validator and not the other parts of code). and can also optimize if we ask several times the same block (not sure if we have this case but maybe in case of a retry)
@ahamlat is this PR one you're going to come back to or is the outcome to go with a different approach?
@macfarla We didn't decided yet. @fab-10 and @matkt are in favor to ship this PR as it is ?
@ahamlat @fab-10 @matkt - can we agree on an implementation? We could ship this as-is? The improvements are already substantial. Or in favor of a small rework?
Can we circle back to this PR and determine whether this is the approach we want to take?
I think a cache based approach, like the one done for #6375 is better
I agree with @fab-10, let me think about how to include better caching mecanism
@ahamlat is this still relevant/in progress?