Scope

Need to develop performance testing suite for

[ ] HTTP/2 Web cache with 10 and 100 streams
[ ] HTTPS Web cache
[ ] HTTP/2 proxy mode with 10 and 100 streams
[ ] HTTPS proxy mode

All the tests above must compare Tempesta FW against:

[ ] optimized HAproxy
[ ] optimized Nginx
[ ] optimized Envoy (no cache)
[ ] optimized Varnish
[ ] previous results for Temesta FW 0.6.8 for now (no HTTP/2 tests so far)

The tests must measure:

[ ] RPS for empty 200 responses, 1KB and 100KB responses
[ ] avg and 99p latency (from #1096) for empty 200 responses, 1KB and 100KB responses

These tests must run in two environments:

[ ] KVM
[ ] bare metal

The tests must run periodically in smoke (short) mode on CI and full run, including other web servers comparisons.

The test results should be stored in a server filesystem along with the configuration and system statistics (memory and CPU usage at first). A benchmark results must be also stored as text files with the command line to run the benchmark.

The CI jobs for the smoke performance tests must plot a Grafana graph to compare with previous runs and observe the trend.

Representing performance measurements

The benchmark runs must be cleaned to avoid results deviations. Different resources use 3-25 runs to get clean data and use different approaches for cleaning:

Minimum values (Netflix)
Statistical thresholds
Student and Welch t-tests
Many others

See https://bencher.dev/docs/explanation/thresholds/

References

Following issues address the problems, which must be revealed with the test suite, but require manual work.

#1415
#515
#1064 (using tls-perf)

https://github.com/nyrkio/dsi - automated performance regression testing in Python, inherited from MongoDB

https://github.com/bencherdev/bencher - similar project in Rust

Jul 30 '17 23:07 krizhanovsky

Performance Testing Plan

1. Existing Stress Tests:
    Can be used for performance testing.
    Need to write a configuration with a reasonable number of requests and parameters.

2. Grafana for Results Visualization:
    Determine how to calculate metrics for each test.
    Initially, it is sufficient to have a single metric for each test (total of 4 metrics).

3. CI (Continuous Integration):
    Set up a dedicated worker (virtual machine) for running performance tests.
    Create a separate pipeline for execution. It should operate only on the dedicated virtual machine for performance testing.

4. Reporting Script:
    Develop a script that will:
        - Report results.
        - Log installed packages.
        - Store all information locally in an archived format.

5. Grafana Charts:
    Draw a separate chart in Grafana for each of the 4 test suites.

6. Running Tests against HAproxy/nginx/Envoy:
    Add the execution of these tests against HAproxy/nginx/Envoy.
    Display the results on the charts of the corresponding test suites.

Jun 27 '24 00:06 ykargin

We agreed on the call, that we'll go with adjusting our existing code from https://github.com/tempesta-tech/tempesta-test/ to build the performance regression test suite

Jul 23 '24 18:07 krizhanovsky

Another View on Building Such Utility

Overview

This utility aims to build a Python application based on Flask/Flask-Admin for managing and visualizing historical data, triggering regression tests, generating charts, and configuring tests. The architecture will involve Prometheus for collecting performance metrics and Celery for background task management. .

1. Python App (Flask/Flask-Admin)

A simple application using Flask-Admin to provide a CRUD interface for various tables. This setup will include standard CMS-style CRUD operations along with filters, bulk actions, and more.

Steps to Implement:

Create PostgreSQL Database and Define Schema
Tables to include:
- TestCase — [id: int, name: str, description: str, command: str, created_at: datetime]
- Chart — [id: int, metric_name: str, y_axis_name: str]
- ChartGroup — [id: int, name: str, description: str]
- Chart2ChartGroup — [chart_id: int, chart_group_id: int]
- TestCase2ChartGroup — [test_case_id: int, chart_group_id: int]
- TestRun — [id: int, app_version: str, test_case: int]
- TestCaseResult — [id: int, test_run_id: int, started_at: datetime, finished_at: datetime, error: str, status: str, created_at: datetime]
- Metric — [id: int, name: str, value: float, test_case_result_id: int, created_at: datetime]
Extend Flask-Admin Pages for Result View
- Create an HTML page that uses results from the database to generate charts based on TestCase and its associated ChartGroup.
- Provide interactive chart views for better visualization.

2. Celery

Celery will be used to handle background tasks related to test execution and metrics collection.

Task Management:

Task 1:
- A simple background task that triggers a command (Ansible playbook, Bash script, Python script, etc.), waits for execution to finish, and stores stdout / stderr in the database.
Task 2:
- A task that fetches metrics from Prometheus based on the TestCase definition and stores them in the database.

3. Prometheus

Prometheus will be used to gather various performance metrics such as:

HTTP traffic
CPU load
Memory usage
Network throughput
Disk I/O, etc.

Activate and configure the required metrics according to the needs of each test case.

How It Works?

Prepare

Activate the required metrics in Prometheus.
Define charts and chart groups, and associate test cases with relevant groups of charts.
Create a TestCase record with the desired configuration.

Manual Flow

User accesses the CMS and creates a new TestRun record with parameters: name, test case, Tempesta app version.
This action triggers Task 1 and Task 2.
As the test runs, tasks update their status in the database, providing a clear view of the test case lifecycle.

Automated Flow

Similar to the manual flow, but the TestRun record is created by a CI system via a REST endpoint or by invoking specific tasks.

Estimated Time

2-3 weeks

Summary

This approach looks like not just a regression tool for Tempesta FW, but rather a versatile solution applicable to any project. It can be used to run various tests, configure comprehensive monitoring — such as disk usage, I/O, CPU, network traffic — and apply it to different types of projects like HTTP servers, databases, S3 storage systems, and more.

Additionally, it can be integrated with CI systems to monitor all commits and automatically detect which changes caused regressions.

Anyway, firstly we could check some of this projects and try to find maybe something that fit us:

AllureReport
ReportPortal and it's demo site
The list of other tools

Apr 01 '25 23:04 symstu-tempesta

Probably @RomanBelozerov and @const-t have more to say, but I'd just prefer to avoid PostgreSQL usage and replace it with MySQL or MariaDB to keep our technology stack simple

Apr 02 '25 20:04 krizhanovsky

Test: automated performance testing suite

Scope

Representing performance measurements

References

Another View on Building Such Utility

Overview

1. Python App (Flask/Flask-Admin)

Steps to Implement:

2. Celery

Task Management:

3. Prometheus

How It Works?

Prepare

Manual Flow

Automated Flow

Estimated Time

Summary