Valkey Benchmarking Automation Framework
Valkey Benchmarking Automation Framework
Similar Benchmarking Tool as a small project that I made while benchmarking 7.2 vs 8 that ran standard tests daily on OSS Valkey 'valkey-benchmark'. Although, this is not active anymore. * Github link: https://github.com/roshkhatri/valkey-benchmark-tools/blob/main/README.md * Dashboard here: https://d5gk5hctlvf6n.cloudfront.net/##
Introduction
Presently, we have been benchmarking Valkey and the versions manually. Valkey contributors and maintainers need a reproducible benchmarking tool to guard against performance regression and guide performance improvements.
This automation with the new framework benchmarks every commit from the valkey/unstable branch in a controlled, repeatable, and secure environment. It is triggered either via a cron job or manually and utilizes a GitHub Actions-based workflow (ci.yml, benchmark.yml, and cleanup.yml) to automate the setup, execution, and teardown phases.
We will add a new repository for this framework and this will also allow us to benchmark engine releases against different versions. And we could also use this framework to add a label to a PR to run Benchmark and update results on the PR. Benchmarks will run on isolated EC2 instances or equivalent hardware using a dedicated client-server setup.
We would want this system to be cloud agnostic. But to start with we would use EC2.
This Issue outlines:
- The workflows required to set up environments, run benchmarks, and update results (Workflows are the controllers of the framework)
- Python scripts needed to run benchmark and obtain results (This is the Framework for benchmarking)
- The mechanism for parsing benchmark output and writing to JSON/metrics in any database.
- How all components are secure, portable, and automated via GitHub Actions.
System Goals
Secure
- Only authorized GitHub users can trigger benchmarks.
- No data or binaries persist after the run.
- Network access is restricted to essential client-server traffic.
Portable
- Runs on AWS, other clouds, or on-prem hardware.
- It can also be cloned locally to run series of benchmarks locally.
Integrated with GitHub Actions
- Handles triggering, logging, reporting, timeouts, and orchestration.
- Supports both manual and scheduled (cron) runs.
- Provides full observability of benchmarking lifecycle.
Self-Contained
- Builds Valkey from source for every commit.
- Cleans up data to avoid storage overflow.
- Requires no pre-installed binaries or persistent state.
Extensible
- Pluggable benchmark interface via Python scripts.
- Custom test scripts (bash, compiled) easily supported.
Reproducible
- Warm-up phase before recording results.
- Bare-metal EC2 instances or isolated hardware.
- OS and CPU affinity configured for performance stability.
Components
Diagram
Trigger Mechanism
- Manual Dispatch: GitHub UI using
workflow_dispatch - Scheduled Trigger: GitHub Action using
cron
Workflows
ci.yml
- Walks backward from the latest
valkey/unstablecommit. - Checks
completed_commits.jsonto avoid repeats. - Dispatches the benchmark job for commit and the build config.
- Commits the uploaded artifacts from
benchmark.ymlto the repo.
benchmark.yml
- Connect to client and server EC2 instances.
- Builds Valkey using the commit on the EC2 instance as a self hosted runner
- Uses
benchmark.pyto run all test combinations from thebenchmark-configs.json - uploads the output file as artifacts and to to S3 or any DB
Outputs:
- Updated
completed_commits.json - *
results/{commit_id}/metrics.jsonto S3 or any DB
cleanup.yml
- Kills Valkey processes and deletes temp data.
- Ensures stateless cleanup on self-hosted runners.
Benchmark Execution Flow
Checkout Commit
- Clone the repository.
- Checkout the selected commit.
Build & Configure
- Build Valkey from source using
- Apply OS tuning:
- Use
isolcpusto isolate CPUs. - Use
tasksetto pin the valkey process to the isolated CPUs
- Use
Run Benchmark Tests
- Warm-up phase (5 mins).
- Run commands:
PING, SET, GET, INCR, LPUSH, RPUSH, LPOP,
SADD, HSET, SPOP, ZADD, ZPOPMIN, LRANGE (100,300,600),
MSET, XADD
Logging
- Log the commands and outputs.
Collect Results
- Log stdout/stderr
- Save metrics to JSON
- Upload:
- S3/DB for permanent storage
- GitHub Actions artifact (retained 30 days)
Cleanup
- Run
cleanup.ymlto wipe build, logs, and processes.
Benchmark Tool Layout
src/
├── configs/
│ ├── benchmark-configs.json
│ └── build-configs.json
│
├── utils/
│ ├── logger.py
│ ├── process_metrics.py
│ ├── valkey_build.py
│ ├── valkey_server.py
│ └── valkey_benchmark.py
│
├── main.py
└── README.md
Sample benchmark-configs.json
[
{
"requests": 10000000,
"keyspacelen": 10000000,
"data_sizes": [16, 128, 1024],
"pipelines": [10],
"commands": ["SET", "GET", "RPUSH", "LPUSH", "LPOP", "SADD", "SPOP", "HSET"]
},
]
We can add more scenarios to this arrays in the future.
Sample build-configs.json
{
"cluster_modes": ["no", "yes"],
"tls_modes": [false, true]
}
Logic Breakdown
benchmark.py
- To interact with the tool we can call benchmark.py with options like below
python src/benchmark.py \
--mode client \
--commit 8d0f90a \
--target_ip $SERVER_IP \
--port $PORT \
--tls no \
--cluster no \
--config src/configs/benchmark-configs.json
- Loads
benchmark-configs.json - Generates all benchmark combinations.
- Delegates to:
valkey_build.pyto build with provided option on the server and client machinevalkey_server.pyto run the servervalkey_benchmark.pyto run tests on client machine.
- Also updates the
completed_commits.jsonwith the presentcommit_id
valkey_benchmark.py
- Handles benchmark logic:
- Pings the server
- Invokes
valkey-benchmark - Delegates to:
logger.pystreams to/results/<commit_id>/logs.txtfor post-run inspectionprocess_metrics.pyto generate the metrics inmetrics.json
- /results/<commit_id>/metrics.json
- /results/<commit_id>/logs.txt
- The
benchmark.ymluploads these files to thegithub runnerand also uploads the metrics to the respective cloud storage for metrics.
name: Upload Results
uses: actions/upload-artifact@v4
with:
name: valkey-benchmark-${COMMIT_ID}
path: |
completed_commits.json
results/${COMMIT_ID}/metrics.json
results/${COMMIT_ID}/logs.txt
Artifacts & Files
completed_commits.json— Tracks already tested commitsbenchmark-configs.json— Benchmark input configurationssrc/benchmark.py— Entry point and orchestratorresults/<commit_id>/metrics.json— Output metrics (RPS, latency)results/<commit_id>/logs.txt— Benchmark execution logsci.yml— Trigger and selector for not benchmarked commitsbenchmark.yml— Main execution workflowcleanup.yml— Post-benchmark teardown
I have actually started with the implementation, we can discuss this design, and overall get more views from the community