wazuh icon indicating copy to clipboard operation
wazuh copied to clipboard

Benchmarking tests: Performance Statistical Data Analyzer

Open Rebits opened this issue 7 months ago • 11 comments

Description

We need to develop a Performance Statistical Data Analyzer module capable of detecting performance regressions in our tests. This module should analyze statistical data collected during testing to identify any significant decreases in performance metrics compared to baseline benchmarks.

Design

Based on the research conducted at https://github.com/wazuh/wazuh-qa/issues/5502, we have decided to use Python and its libraries to develop this analysis module by creating scripts. Libraries such as Pandas, NumPy, Matplotly, or Seaborn can be used.

This diagram can broadly represent the module:

graph LR

    Q[Metrics Data] --> P[Process Comparator]
    F1[Benchmark Baseline] --> P
    P --> R[Report]

    subgraph Statistical Data Analyzer
        P
    end    

Functional requirements

  • The metrics data coming into the module for comparison must be properly formatted according to the stipulations to match the baseline. So the module should include a pre-processing step to validate incoming data.
  • The baseline benchmarks must be stored somewhere (for example in a database) to which the module has access and can extract the data.
  • The module should compare current metrics data against the baseline benchmarks to detect significant performance regressions. Statistical methods must be used to identify regressions.
  • After comparison, a report should be generated with the comparison results (HTML or, as possible, PDF). This report must highlight significant regressions and provide data to justify it.

Non-functional requirements

  • The module must be optimized to handle large data sets efficiently.
  • The module must be as easy to use as possible.
  • The procedure for using the module must be documented.

Validation

  • Proof of Concept of analyzing data.

Rebits avatar Jul 18 '24 15:07 Rebits