stellar-core Benchmark for measuring ledger apply time

Benchmark for measuring ledger apply time

Open dmkozh opened this issue 7 months ago • 6 comments

Currently the go-to way of measuring Core's performance is the 'max TPS' test, that relies on generating transactions and sending them to a synthetic network. Unfortunately, this test does not care about the ledger close times, which has been serviceable thus far, but is not sufficient for the most of Soroban performance improvements. For example, it can't provide a good answer to a question like 'how does increasing a ledger-wide instruction limit from 500M to 1B affects the ledger close time?' or 'how much more capacity we could get if we applied transactions with 8 parallel threads?' etc.

In order to be able to reason about the Soroban performance, a new type of test/benchmark is necessary. The basic requirements are as follows:

Runs on a single machine, without setting up the whole network in order to isolate the costs and use hardware that is closer in specs to the real validators (max TPS tests use worse machines)
- It would be nice to run the test as a part of CI, but probably on-demand mode only is sufficient initially
Allows customization of the network config
Generates a stream of synthetic Soroban transactions with configurable resource consumption and reasonably tight declared resource estimations. The transactions also have to succeed most (at least 95+%) of the time. Both properties are important as to not inflate the modelled/real resource usage ratio
Builds a transaction set and applies it to the ledger
- During the application we need to make sure that we hit the necessary ledger-related logic, such as prefetch
Measures the apply time metrics (such as avg/p99/max) and potentially allows to make assertions on that

There are also features that could be nice to have, but are not critical initially, such as:

Also generate a stream of reasonable classic transactions, including DEX transactions. This will allow us to reason about classic vs Soroban impact on the close time and scaling classic up or down
Measure the tx set building timing. That's useful for building tx sets with parallel support which has non-trivial cost
Provide eviction-related scenarios, such as generating transactions that create short-lived temporary entries
- This requires ensuring that we're triggering the eviction logic as well
Start with a pubnet ledger snapshot in order to ensure we have more realistic ledger I/O timing

Some basic ideas for the initial version:

Factor out the tx generation logic from loadgen
Fix the resource estimation logic for tighter estimates (I think we should be reasonable close there)
For the benchmark itself:
- Do preparation similar to loadgen (generate accounts, upload wasm, instantiate the necessary contracts, do settings upgrade), but by directly applying the transactions (no need to go through consensus here)
- Define reasonable generated resource/inclusion fee distributions and generate transactions following these distributions until hitting the tx queue limits (i.e. 2x the ledger capacity for each resource)
  - Resource fees can be arbitrarily high
- Build a tx set from the generated transactions and run the logic for closing the ledger with the generated tx set
- Generate a new batch of transactions until we hit the 2x resource limit again

Jul 18 '24 21:07 dmkozh

stellar-core stellar-core copied to clipboard

Benchmark for measuring ledger apply time

stellar-core
stellar-core copied to clipboard