opentelemetry-cpp
opentelemetry-cpp copied to clipboard
Add stress testing framework, with basic metrics example to demonstrate.
Changes
This PR adds a basic stress testing framework to validate the scalability and reliability of the functionality under high-concurrency and long-running workloads. Unlike Google Benchmark, which focuses on micro-benchmarking and latency measurements for isolated operations, this framework tries to simulate sustained, multi-threaded workloads to test a given workload. The idea is to complement the existing benchmarks by adding stress-tests to addressing long-duration and high-concurrency use-cases.
This is already implemented for .Net and Rust, and most of the ideas are taken from there. I felt the need for this to test some optimizations I am doing for metrics, but feel to comment if this doesn't seem helpful.
Also added a basic stress-testing example for metrics to demonstrate. Below are the results from the metrics stress test as an example:
$ ./stress_metrics
Starting stress test with 16 threads...
Throughput: 5009490 it/s | Avg: 4885764 | Min: 4734280 | Max: 5132395
Test completed:
Total iterations: 203373637
Duration: 42 seconds
Average throughput: 4885764 iterations/sec
$
It’s still in the early stages and will need further enhancements but should be a good starting point. Future improvements could include adding memory and CPU usage information alongside the existing throughput, as well as refining the initial warm-up period to sustain consistent data collection.
Implementation Details:
Worker Threads: - The worker threads (default to number of cores) are spawned to execute the workload. - Each worker thread executes the workload function (func) in a loop until a global STOP flag is set. (ctrl-c) - Each thread maintains its own iteration count to minimize contention.
Throughput Monitoring: - A separate controller thread monitors throughput by periodically summing up iteration counts across threads. - Throughput is calculated over a sliding window (SLIDING_WINDOW_SIZE) and displayed dynamically.
Final Summary: - At the end of the test, the program calculates and prints the total iterations, duration, and average throughput.
For significant contributions please make sure you have completed the following items:
- [ ]
CHANGELOG.mdupdated for non-trivial changes - [ ] Unit tests have been added
- [ ] Changes in public API reviewed