valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Valkey Benchmarking Automation Framework

Open roshkhatri opened this issue 7 months ago • 0 comments

Valkey Benchmarking Automation Framework

Similar Benchmarking Tool as a small project that I made while benchmarking 7.2 vs 8 that ran standard tests daily on OSS Valkey 'valkey-benchmark'. Although, this is not active anymore. * Github link: https://github.com/roshkhatri/valkey-benchmark-tools/blob/main/README.md * Dashboard here: https://d5gk5hctlvf6n.cloudfront.net/##

Introduction

Presently, we have been benchmarking Valkey and the versions manually. Valkey contributors and maintainers need a reproducible benchmarking tool to guard against performance regression and guide performance improvements.

This automation with the new framework benchmarks every commit from the valkey/unstable branch in a controlled, repeatable, and secure environment. It is triggered either via a cron job or manually and utilizes a GitHub Actions-based workflow (ci.yml, benchmark.yml, and cleanup.yml) to automate the setup, execution, and teardown phases.

We will add a new repository for this framework and this will also allow us to benchmark engine releases against different versions. And we could also use this framework to add a label to a PR to run Benchmark and update results on the PR. Benchmarks will run on isolated EC2 instances or equivalent hardware using a dedicated client-server setup.

We would want this system to be cloud agnostic. But to start with we would use EC2.

This Issue outlines:

  • The workflows required to set up environments, run benchmarks, and update results (Workflows are the controllers of the framework)
  • Python scripts needed to run benchmark and obtain results (This is the Framework for benchmarking)
  • The mechanism for parsing benchmark output and writing to JSON/metrics in any database.
  • How all components are secure, portable, and automated via GitHub Actions.

System Goals

Secure

  • Only authorized GitHub users can trigger benchmarks.
  • No data or binaries persist after the run.
  • Network access is restricted to essential client-server traffic.

Portable

  • Runs on AWS, other clouds, or on-prem hardware.
  • It can also be cloned locally to run series of benchmarks locally.

Integrated with GitHub Actions

  • Handles triggering, logging, reporting, timeouts, and orchestration.
  • Supports both manual and scheduled (cron) runs.
  • Provides full observability of benchmarking lifecycle.

Self-Contained

  • Builds Valkey from source for every commit.
  • Cleans up data to avoid storage overflow.
  • Requires no pre-installed binaries or persistent state.

Extensible

  • Pluggable benchmark interface via Python scripts.
  • Custom test scripts (bash, compiled) easily supported.

Reproducible

  • Warm-up phase before recording results.
  • Bare-metal EC2 instances or isolated hardware.
  • OS and CPU affinity configured for performance stability.

Components

Diagram

Image

Trigger Mechanism

  • Manual Dispatch: GitHub UI using workflow_dispatch
  • Scheduled Trigger: GitHub Action using cron

Workflows

ci.yml

  • Walks backward from the latest valkey/unstable commit.
  • Checks completed_commits.json to avoid repeats.
  • Dispatches the benchmark job for commit and the build config.
  • Commits the uploaded artifacts from benchmark.yml to the repo.

benchmark.yml

  • Connect to client and server EC2 instances.
  • Builds Valkey using the commit on the EC2 instance as a self hosted runner
  • Uses benchmark.py to run all test combinations from the benchmark-configs.json
  • uploads the output file as artifacts and to to S3 or any DB

Outputs:

  • Updated completed_commits.json
  • *results/{commit_id}/metrics.json to S3 or any DB

cleanup.yml

  • Kills Valkey processes and deletes temp data.
  • Ensures stateless cleanup on self-hosted runners.

Benchmark Execution Flow

Checkout Commit

  • Clone the repository.
  • Checkout the selected commit.

Build & Configure

  • Build Valkey from source using
  • Apply OS tuning:
    • Use isolcpus to isolate CPUs.
    • Use taskset to pin the valkey process to the isolated CPUs

Run Benchmark Tests

  • Warm-up phase (5 mins).
  • Run commands:
PING, SET, GET, INCR, LPUSH, RPUSH, LPOP,
SADD, HSET, SPOP, ZADD, ZPOPMIN, LRANGE (100,300,600),
MSET, XADD

Logging

  • Log the commands and outputs.

Collect Results

  • Log stdout/stderr
  • Save metrics to JSON
  • Upload:
    • S3/DB for permanent storage
    • GitHub Actions artifact (retained 30 days)

Cleanup

  • Run cleanup.yml to wipe build, logs, and processes.

Benchmark Tool Layout

src/
├── configs/
│   ├── benchmark-configs.json
│   └── build-configs.json
│
├── utils/
│   ├── logger.py
│   ├── process_metrics.py
│   ├── valkey_build.py
│   ├── valkey_server.py
│   └── valkey_benchmark.py
│
├── main.py
└── README.md

Sample benchmark-configs.json

[
  {
    "requests": 10000000,
    "keyspacelen": 10000000,
    "data_sizes": [16, 128, 1024],
    "pipelines": [10],
    "commands": ["SET", "GET", "RPUSH", "LPUSH", "LPOP", "SADD", "SPOP", "HSET"]
  },
]

We can add more scenarios to this arrays in the future.


Sample build-configs.json

{
"cluster_modes": ["no", "yes"],
"tls_modes": [false, true]
}

Logic Breakdown

benchmark.py

  • To interact with the tool we can call benchmark.py with options like below
python src/benchmark.py \
            --mode client \
            --commit 8d0f90a \
            --target_ip $SERVER_IP \
            --port $PORT \
            --tls no \
            --cluster no \
            --config src/configs/benchmark-configs.json
  • Loads benchmark-configs.json
  • Generates all benchmark combinations.
  • Delegates to:
    • valkey_build.py to build with provided option on the server and client machine
    • valkey_server.py to run the server
    • valkey_benchmark.py to run tests on client machine.
  • Also updates the completed_commits.json with the present commit_id

valkey_benchmark.py

  • Handles benchmark logic:
    • Pings the server
    • Invokes valkey-benchmark
    • Delegates to:
      • logger.py streams to /results/<commit_id>/logs.txt for post-run inspection
      • process_metrics.py to generate the metrics in metrics.json
    • /results/<commit_id>/metrics.json
    • /results/<commit_id>/logs.txt
  • The benchmark.yml uploads these files to the github runner and also uploads the metrics to the respective cloud storage for metrics.
 name: Upload Results
        uses: actions/upload-artifact@v4
        with:
          name: valkey-benchmark-${COMMIT_ID}
          path: |
            completed_commits.json
            results/${COMMIT_ID}/metrics.json
            results/${COMMIT_ID}/logs.txt 

Artifacts & Files

  • completed_commits.json — Tracks already tested commits
  • benchmark-configs.json — Benchmark input configurations
  • src/benchmark.py — Entry point and orchestrator
  • results/<commit_id>/metrics.json — Output metrics (RPS, latency)
  • results/<commit_id>/logs.txt — Benchmark execution logs
  • ci.yml — Trigger and selector for not benchmarked commits
  • benchmark.yml — Main execution workflow
  • cleanup.yml — Post-benchmark teardown

I have actually started with the implementation, we can discuss this design, and overall get more views from the community

roshkhatri avatar May 11 '25 22:05 roshkhatri