opentelemetry-cpp icon indicating copy to clipboard operation
opentelemetry-cpp copied to clipboard

Add stress testing framework, with basic metrics example to demonstrate.

Open lalitb opened this issue 9 months ago • 2 comments

Changes

This PR adds a basic stress testing framework to validate the scalability and reliability of the functionality under high-concurrency and long-running workloads. Unlike Google Benchmark, which focuses on micro-benchmarking and latency measurements for isolated operations, this framework tries to simulate sustained, multi-threaded workloads to test a given workload. The idea is to complement the existing benchmarks by adding stress-tests to addressing long-duration and high-concurrency use-cases.

This is already implemented for .Net and Rust, and most of the ideas are taken from there. I felt the need for this to test some optimizations I am doing for metrics, but feel to comment if this doesn't seem helpful.

Also added a basic stress-testing example for metrics to demonstrate. Below are the results from the metrics stress test as an example:

$ ./stress_metrics
Starting stress test with 16 threads...
Throughput: 5009490 it/s | Avg: 4885764 | Min: 4734280 | Max: 5132395
 
Test completed:
Total iterations: 203373637
Duration: 42 seconds
Average throughput: 4885764 iterations/sec
$

It’s still in the early stages and will need further enhancements but should be a good starting point. Future improvements could include adding memory and CPU usage information alongside the existing throughput, as well as refining the initial warm-up period to sustain consistent data collection.

Implementation Details:

Worker Threads: - The worker threads (default to number of cores) are spawned to execute the workload. - Each worker thread executes the workload function (func) in a loop until a global STOP flag is set. (ctrl-c) - Each thread maintains its own iteration count to minimize contention.

Throughput Monitoring: - A separate controller thread monitors throughput by periodically summing up iteration counts across threads. - Throughput is calculated over a sliding window (SLIDING_WINDOW_SIZE) and displayed dynamically.

Final Summary: - At the end of the test, the program calculates and prints the total iterations, duration, and average throughput.

For significant contributions please make sure you have completed the following items:

  • [ ] CHANGELOG.md updated for non-trivial changes
  • [ ] Unit tests have been added
  • [ ] Changes in public API reviewed

lalitb avatar Jan 10 '25 19:01 lalitb