wazuh-qa icon indicating copy to clipboard operation
wazuh-qa copied to clipboard

Performance test is not working in Workload benchmark test

Open pro-akim opened this issue 2 years ago • 5 comments

Description

Performing: Release 4.6.0 - Pre-Alpha1 - Workload benchmarks metrics The performance test is not functioning correctly in the Workload benchmark test.

Current behavior

When the test is triggered by the pipeline, the following issue occurs:

performance/test_cluster/test_cluster_performance/test_cluster_performance.py::test_cluster_performance FAILED

>           pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
                        f" follow the proper structure.")
E           Failed: Stats could not be retrieved, '/mnt/efs/tmp/CLUSTER-Workload_benchmarks_metrics/B_263' path may not exist, it is empty or it may not follow the proper structure.

Expected behavior

The performance test should run smoothly without encountering any path-related problems.

pro-akim avatar Jul 07 '23 16:07 pro-akim

UPDATE

  • I was testing the test with the same parameters as in the test (it seems that the problem comes from the way I search the path).
  • I'm still looking for the solution.

javiersanchz avatar Dec 11 '23 17:12 javiersanchz

UPDATE

  • I was looking at the .groovy on the test_cluster to see if the error could be coming from there.
  • I did some tests on the test and modifications

javiersanchz avatar Dec 18 '23 17:12 javiersanchz

Reopening

Failed again in 4.8.0-beta2. Results: https://github.com/wazuh/wazuh/issues/22126#issuecomment-1965093634

GGP1 avatar Feb 26 '24 19:02 GGP1

Closing

Was able to execute manually after parsing the cluster logs.

GGP1 avatar Feb 27 '24 13:02 GGP1

Reopened due to the failure encountered in wazuh/wazuh#24894

workload-4.9.0-alpha3-artifacts.zip

I tried to execute it manually without success

python3 -m pytest test_cluster_performance.py --artifacts_path='/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts' --n_workers=25 --n_agents=50000 --html=report.html --self-contained-html
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.1.2, pluggy-1.5.0
rootdir: /home/nstefani/git/wazuh-qa/tests, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1, testinfra-5.0.0
collected 1 item

test_cluster_performance.py F                                            [100%]

=================================== FAILURES ===================================
___________________________ test_cluster_performance ___________________________

artifacts_path = '/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts'
n_workers = 25, n_agents = 50000

    def test_cluster_performance(artifacts_path, n_workers, n_agents):
        """Check that a cluster environment did not exceed certain thresholds.
    
        This test obtains various statistics (mean, max, regression coefficient) from CSVs with
        data generated in a cluster environment (resources used and duration of tasks). These
        statistics are compared with thresholds established in the data folder.
    
        Args:
            artifacts_path (str): Path where CSVs with cluster information can be found.
            n_workers (int): Number of workers folders that are expected inside the artifacts path.
            n_agents (int): Number of agents in the cluster environment.
        """
        if None in (artifacts_path, n_workers, n_agents):
            pytest.fail("Parameters '--artifacts_path=<path> --n_workers=<n_workers> --n_agents=<n_agents>' are required.")
    
        # Check if there are threshold data for the specified number of workers and agents.
        selected_conf = f"{n_workers}w_{n_agents}a"
        if selected_conf not in configurations:
            pytest.fail(f"This is not a supported configuration: {selected_conf}. "
                        f"Supported configurations are: {', '.join(configurations.keys())}.")
    
        # Check if path exists and if expected number of workers matches what is found inside artifacts.
        try:
            cluster_info = ClusterEnvInfo(artifacts_path).get_all_info()
        except FileNotFoundError:
            pytest.fail(f"Path '{artifacts_path}' could not be found or it may not follow the proper structure.")
    
        if cluster_info.get('worker_nodes', 0) != int(n_workers):
            pytest.fail(f"Information of {n_workers} workers was expected inside the artifacts folder, but "
                        f"{cluster_info.get('worker_nodes', 0)} were found.")
    
        # Calculate stats from data inside artifacts path.
        data = {'tasks': ClusterCSVTasksParser(artifacts_path).get_stats(),
                'resources': ClusterCSVResourcesParser(artifacts_path).get_stats()}
    
        if not data['tasks'] or not data['resources']:
>           pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
                        f" follow the proper structure.")
E           Failed: Stats could not be retrieved, '/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts' path may not exist, it is empty or it may not follow the proper structure.

test_cluster_performance.py:68: Failed
- generated html file: file:///home/nstefani/git/wazuh-qa/tests/performance/test_cluster/test_cluster_performance/report.html -
=========================== short test summary info ============================
FAILED test_cluster_performance.py::test_cluster_performance - Failed: Stats ...
============================== 1 failed in 0.57s ===============================

nico-stefani avatar Jul 24 '24 16:07 nico-stefani

@rafabailon it's necessary to review why no binary data was collected in build https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/590/console This could be related to new changes included in the pipeline recently https://github.com/wazuh/wazuh-jenkins/pull/6608

Rebits avatar Aug 06 '24 10:08 Rebits

Update

I've looked through the code and it seems that some files are missing. The error occurs when the user ClusterCSVResourcesParser is asked to use the following files: ['wazuh_clusterd', 'integrity_sync', 'wazuh_clusterd_child_1', 'wazuh_clusterd_child_2']. I've been able to verify that the only file that exists is 'integrity_sync'. It also asks to use the columns: ['USS(KB)', 'CPU(%)', 'FD']. These columns do not exist in the only file that exists.

The missing files are not referenced in the pipeline logs and there is no error in the artifacts indicating that something went wrong.

The changes in https://github.com/wazuh/wazuh-jenkins/pull/6608 should not affect this as the option is not checked in the pipeline execution.

I have launched the pipeline to continue the research: CLUSTER-Workload_benchmarks_metrics/604/

Note: The pipeline requires 5000 Agents and 25 Managers (too much for a test)

rafabailon avatar Aug 06 '24 11:08 rafabailon

Update

The error is that before 4.9.0, the apid process was called wazuh-apid. Since 4.9.0, the process is called wazuh_apid. In the Jenkins pipeline, it is still listed as wazuh-apid. Since this process does not exist in 4.9.0, the monitoring script fails and does not generate the .csv files.

There are two possibilities to fix this error:

  • Add both options in the pipeline. In this case, you do not have to change the code. Care should be taken when launching the pipeline to use the correct value of apid depending on which version of Wazuh is used.

  • Validation in the code. In this case, I have created a PR with the necessary changes. When the monitoring script is to be executed, the Wazuh version is checked and the name of apid is changed based on this.

I have tested running the monitoring script locally to make sure this is the error. I have also run the pipeline with the changes in the code to verify that the necessary .csv files now appear in the artifacts.

rafabailon avatar Aug 07 '24 10:08 rafabailon

Update

Before 4.9.0, the process name was wazuh-apid. Since 4.9.0-beta1, the name has been changed to wazuh_apid. However, in the Jenkins pipeline, the parameter is still wazuh-apid. Changing the parameter in Jenkins would not be a solution since, then, there would be problems when executing the pipeline for versions prior to 4.9.0. I have chosen to modify the name of the process in the pipeline code depending on which version of Wazuh is used.

Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/615/ Artifacts: artifacts.zip

rafabailon avatar Aug 08 '24 06:08 rafabailon

Update

I've made the suggested changes and created a new PR with the correct branch nomenclature

rafabailon avatar Aug 08 '24 08:08 rafabailon

LGTM!

jseg380 avatar Aug 08 '24 08:08 jseg380