foundry
foundry copied to clipboard
perf(`anvil`): enhance block mining performance in Anvil node for high throughput and efficiency
Component
Anvil
Describe the feature you would like
I propose a performance enhancement for the Anvil node, specifically targeting the efficiency of block mining. Through some tests I've observed that while Anvil demonstrates impressive transaction processing capabilities, there's a noticeable disparity in throughput efficiency primarily attributed to the time spent mining blocks. This feature request seeks optimizations in Anvil's block mining to reduce execution time, thereby increasing the overall transactions per second (TPS) throughput and making the node more suitable for applications requiring high transaction processing speeds as well as frequent mining of blocks.
Additional context
Anvil version: 0.2.0 (2cf84d9 2024-02-07T00:15:49.622159000Z)
To illustrate the current performance characteristics and provide a basis for this request, I conducted a test using a Uniswap V3 transaction replay script. The findings highlight a significant potential for performance gains in block mining processes. For instance, when increasing the nullSwapsPerBlock
from 1 to 2000, the average TPS improved dramatically(by a factor of 7x), indicating that the node spends a significant portion of time mining blocks vs actual transaction execution. To replicate this test:
- clone this repo anvil-backtester, install deps(
pnpm i
) - start the anvil node:
pnpm anvil:start
- run the test script:
pnpm test:anvil-memory
withnullSwapsPerBlock
set to 1 and then again set to 2000 and observe results similar to the following indicating significant overhead in mining blocks:
{
blocksToMine: 25,
nullSwapsPerBlock: 1,
totalTxs: 50,
executionTime: 0.084,
averageTPS: 595.2380952380952,
averageTimePerTx: 1.6800000000000002
}
{
blocksToMine: 25,
nullSwapsPerBlock: 2000,
totalTxs: 100000,
executionTime: 24.747,
averageTPS: 4040.8938457186728,
averageTimePerTx: 0.24747000000000002
}
it likely spends most of the time cleaning up / updating old state
could you try with --prune-history
if you notice any difference?
There's definitely room for significant improvements here
@mattsse I am using --prune-history
in the anvil command as shown below
https://github.com/mshakeg/anvil-backtester/blob/main/shell/anvil.sh
Removing --prune-history
and --transaction-block-keeper 4
from the above command does not result in any noticeable changes in performance.
hmm, could you perhaps run this with samply https://github.com/mstange/samply and see if anything sticks out
I'll try to investigate shortly
@mattsse thanks, don't really know what to make of the profile, but I've attached the trace on evm_mine
, maybe GPT4 could be a source of inspiration :)
Based on this call trace, here are a few points to consider for profiling and improving performance:
Database Interactions: The
evm_mine
operation involves interactions with an in-memory database. Optimizations here could involve reducing the number of reads and writes, caching frequently accessed data, or improving the database's data structures.State Trie Manipulation: There are multiple calls to
trie_db
functions, which indicate manipulation of the state trie. This is an area that typically has a significant impact on performance. Optimizing trie algorithms or using a more efficient trie structure could yield performance improvements.Hash Calculations: The
keccak_hasher
andtiny_keccak
functions suggest that Keccak hashing is part of the operation. Optimizing hashing or reducing the number of hash calculations required could improve performance.EVM Execution: The
revm
specific calls such asrun_interpreter
andpreverified_inner
imply that EVM bytecode execution is a part of the process. Profiling the EVM's interpreter loop, opcode execution, and context switching could reveal bottlenecks.Smart Contract Calls: Calls to
inspect_call_instruction
andHost::call
suggest that smart contract function calls are being made. Optimizing the way smart contracts are called and executed, possibly by reducing the overhead of call setup and teardown, could improve performance. This could include minimizing the overhead associated with setting up the environment for a contract call and efficiently handling the stack and memory operations.Parallelism and Concurrency: Evaluate if any parts of the
evm_mine
process can be executed in parallel. Some operations, especially state-independent ones, may benefit from concurrent execution.Memory Management: Functions like
drop_in_place
suggest that there is active management of memory, possibly with data structures being de-allocated. Improving memory allocation strategies, avoiding unnecessary allocations, and reusing memory buffers could reduce overhead and improve performance.Opcode Optimization: Within the EVM execution, certain opcodes may be used more frequently or may be more resource-intensive. Profiling at the opcode level could help identify if specific opcodes are bottlenecks and could be optimized.
Caching Strategies: For repetitive operations, especially within the EVM interpreter, caching results of expensive computations could be beneficial if they're likely to be repeated with the same inputs.
Profiling and Instrumentation Tools: Utilize profiling tools that can provide granular insights into CPU and memory usage. Rust's performance tools, such as
perf
on Linux or DTrace/BPF on BSD/Mac, can help identify hot paths and functions that are taking the most time or consuming the most resources.Algorithmic Efficiency: Review the algorithms used in the trie manipulation and hashing to ensure they are the most efficient for the use case. Sometimes, algorithmic improvements can yield better performance gains than low-level optimizations.
Code Review and Refactoring: There might be opportunities to refactor the code for efficiency. This could involve combining functions, inlining functions to reduce call overhead, or simplifying complex logic.
Batch Processing: If the
evm_mine
operation can be batched (i.e., processing multiple transactions or blocks in a single operation), it could reduce the per-operation overhead and take advantage of more efficient bulk processing techniques.Asynchronous Processing: Look into asynchronous processing where applicable to avoid blocking operations, particularly for I/O bound tasks.
thanks!
will investigate, but looks like stateroot
@mattsse thanks, might be a good idea to have flags that disable logic not really needed on a local node, similar to how the eth_sendUnsignedTransaction
method can be used to send an unsigned transaction.
Relevant conversation in #7546: https://github.com/foundry-rs/foundry/pull/7546#issuecomment-2041338137