Valkey Benchmarking Automation Framework

Open roshkhatri opened this issue 7 months ago • 0 comments

Valkey Benchmarking Automation Framework

Similar Benchmarking Tool as a small project that I made while benchmarking 7.2 vs 8 that ran standard tests daily on OSS Valkey 'valkey-benchmark'. Although, this is not active anymore. * Github link: https://github.com/roshkhatri/valkey-benchmark-tools/blob/main/README.md * Dashboard here: https://d5gk5hctlvf6n.cloudfront.net/##

Introduction

Presently, we have been benchmarking Valkey and the versions manually. Valkey contributors and maintainers need a reproducible benchmarking tool to guard against performance regression and guide performance improvements.

This automation with the new framework benchmarks every commit from the valkey/unstable branch in a controlled, repeatable, and secure environment. It is triggered either via a cron job or manually and utilizes a GitHub Actions-based workflow (ci.yml, benchmark.yml, and cleanup.yml) to automate the setup, execution, and teardown phases.

We will add a new repository for this framework and this will also allow us to benchmark engine releases against different versions. And we could also use this framework to add a label to a PR to run Benchmark and update results on the PR. Benchmarks will run on isolated EC2 instances or equivalent hardware using a dedicated client-server setup.

We would want this system to be cloud agnostic. But to start with we would use EC2.

This Issue outlines:

The workflows required to set up environments, run benchmarks, and update results (Workflows are the controllers of the framework)
Python scripts needed to run benchmark and obtain results (This is the Framework for benchmarking)
The mechanism for parsing benchmark output and writing to JSON/metrics in any database.
How all components are secure, portable, and automated via GitHub Actions.

System Goals

Secure

Only authorized GitHub users can trigger benchmarks.
No data or binaries persist after the run.
Network access is restricted to essential client-server traffic.

Portable

Runs on AWS, other clouds, or on-prem hardware.
It can also be cloned locally to run series of benchmarks locally.

Integrated with GitHub Actions

Handles triggering, logging, reporting, timeouts, and orchestration.
Supports both manual and scheduled (cron) runs.
Provides full observability of benchmarking lifecycle.

Self-Contained

Builds Valkey from source for every commit.
Cleans up data to avoid storage overflow.
Requires no pre-installed binaries or persistent state.

Extensible

Pluggable benchmark interface via Python scripts.
Custom test scripts (bash, compiled) easily supported.

Reproducible

Warm-up phase before recording results.
Bare-metal EC2 instances or isolated hardware.
OS and CPU affinity configured for performance stability.

Components

Diagram

Trigger Mechanism

Manual Dispatch: GitHub UI using workflow_dispatch
Scheduled Trigger: GitHub Action using cron

Workflows

`ci.yml`

Walks backward from the latest valkey/unstable commit.
Checks completed_commits.json to avoid repeats.
Dispatches the benchmark job for commit and the build config.
Commits the uploaded artifacts from benchmark.yml to the repo.

`benchmark.yml`

Connect to client and server EC2 instances.
Builds Valkey using the commit on the EC2 instance as a self hosted runner
Uses benchmark.py to run all test combinations from the benchmark-configs.json
uploads the output file as artifacts and to to S3 or any DB

Outputs:

Updated completed_commits.json
*results/{commit_id}/metrics.json to S3 or any DB

`cleanup.yml`

Kills Valkey processes and deletes temp data.
Ensures stateless cleanup on self-hosted runners.

Benchmark Execution Flow

Checkout Commit

Clone the repository.
Checkout the selected commit.

Build & Configure

Build Valkey from source using
Apply OS tuning:
- Use isolcpus to isolate CPUs.
- Use taskset to pin the valkey process to the isolated CPUs

Run Benchmark Tests

Warm-up phase (5 mins).
Run commands:

PING, SET, GET, INCR, LPUSH, RPUSH, LPOP,
SADD, HSET, SPOP, ZADD, ZPOPMIN, LRANGE (100,300,600),
MSET, XADD

Logging

Log the commands and outputs.

Collect Results

Log stdout/stderr
Save metrics to JSON
Upload:
- S3/DB for permanent storage
- GitHub Actions artifact (retained 30 days)

Cleanup

Run cleanup.yml to wipe build, logs, and processes.

Benchmark Tool Layout

src/
├── configs/
│   ├── benchmark-configs.json
│   └── build-configs.json
│
├── utils/
│   ├── logger.py
│   ├── process_metrics.py
│   ├── valkey_build.py
│   ├── valkey_server.py
│   └── valkey_benchmark.py
│
├── main.py
└── README.md

Sample `benchmark-configs.json`

[
  {
    "requests": 10000000,
    "keyspacelen": 10000000,
    "data_sizes": [16, 128, 1024],
    "pipelines": [10],
    "commands": ["SET", "GET", "RPUSH", "LPUSH", "LPOP", "SADD", "SPOP", "HSET"]
  },
]

We can add more scenarios to this arrays in the future.

Sample `build-configs.json`

{
"cluster_modes": ["no", "yes"],
"tls_modes": [false, true]
}

Logic Breakdown

`benchmark.py`

To interact with the tool we can call benchmark.py with options like below

python src/benchmark.py \
            --mode client \
            --commit 8d0f90a \
            --target_ip $SERVER_IP \
            --port $PORT \
            --tls no \
            --cluster no \
            --config src/configs/benchmark-configs.json

Loads benchmark-configs.json
Generates all benchmark combinations.
Delegates to:
- valkey_build.py to build with provided option on the server and client machine
- valkey_server.py to run the server
- valkey_benchmark.py to run tests on client machine.
Also updates the completed_commits.json with the present commit_id

`valkey_benchmark.py`

Handles benchmark logic:
- Pings the server
- Invokes valkey-benchmark
- Delegates to:
  - logger.py streams to /results/<commit_id>/logs.txt for post-run inspection
  - process_metrics.py to generate the metrics in metrics.json
- /results/<commit_id>/metrics.json
- /results/<commit_id>/logs.txt
The benchmark.yml uploads these files to the github runner and also uploads the metrics to the respective cloud storage for metrics.

 name: Upload Results
        uses: actions/upload-artifact@v4
        with:
          name: valkey-benchmark-${COMMIT_ID}
          path: |
            completed_commits.json
            results/${COMMIT_ID}/metrics.json
            results/${COMMIT_ID}/logs.txt

Artifacts & Files

completed_commits.json — Tracks already tested commits
benchmark-configs.json — Benchmark input configurations
src/benchmark.py — Entry point and orchestrator
results/<commit_id>/metrics.json — Output metrics (RPS, latency)
results/<commit_id>/logs.txt — Benchmark execution logs
ci.yml — Trigger and selector for not benchmarked commits
benchmark.yml — Main execution workflow
cleanup.yml — Post-benchmark teardown

I have actually started with the implementation, we can discuss this design, and overall get more views from the community

May 11 '25 22:05 roshkhatri

Valkey Benchmarking Automation Framework

Valkey Benchmarking Automation Framework

System Goals

Secure

Portable

Integrated with GitHub Actions

Self-Contained

Extensible

Reproducible

Components

Diagram

Trigger Mechanism

Workflows

ci.yml

benchmark.yml

cleanup.yml

Benchmark Execution Flow

Checkout Commit

Build & Configure

Run Benchmark Tests

Logging

Collect Results

Cleanup

Benchmark Tool Layout

Sample benchmark-configs.json

Sample build-configs.json

Logic Breakdown

benchmark.py

valkey_benchmark.py

Artifacts & Files

`ci.yml`

`benchmark.yml`

`cleanup.yml`

Sample `benchmark-configs.json`

Sample `build-configs.json`

`benchmark.py`

`valkey_benchmark.py`