wazuh-qa
wazuh-qa copied to clipboard
Performance test is not working in Workload benchmark test
Description
Performing: Release 4.6.0 - Pre-Alpha1 - Workload benchmarks metrics The performance test is not functioning correctly in the Workload benchmark test.
Current behavior
When the test is triggered by the pipeline, the following issue occurs:
performance/test_cluster/test_cluster_performance/test_cluster_performance.py::test_cluster_performance FAILED
> pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
f" follow the proper structure.")
E Failed: Stats could not be retrieved, '/mnt/efs/tmp/CLUSTER-Workload_benchmarks_metrics/B_263' path may not exist, it is empty or it may not follow the proper structure.
Expected behavior
The performance test should run smoothly without encountering any path-related problems.
UPDATE
- I was testing the test with the same parameters as in the test (it seems that the problem comes from the way I search the path).
- I'm still looking for the solution.
UPDATE
- I was looking at the .groovy on the test_cluster to see if the error could be coming from there.
- I did some tests on the test and modifications
Reopening
Failed again in 4.8.0-beta2. Results: https://github.com/wazuh/wazuh/issues/22126#issuecomment-1965093634
Closing
Was able to execute manually after parsing the cluster logs.
Reopened due to the failure encountered in wazuh/wazuh#24894
workload-4.9.0-alpha3-artifacts.zip
I tried to execute it manually without success
python3 -m pytest test_cluster_performance.py --artifacts_path='/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts' --n_workers=25 --n_agents=50000 --html=report.html --self-contained-html
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.1.2, pluggy-1.5.0
rootdir: /home/nstefani/git/wazuh-qa/tests, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1, testinfra-5.0.0
collected 1 item
test_cluster_performance.py F [100%]
=================================== FAILURES ===================================
___________________________ test_cluster_performance ___________________________
artifacts_path = '/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts'
n_workers = 25, n_agents = 50000
def test_cluster_performance(artifacts_path, n_workers, n_agents):
"""Check that a cluster environment did not exceed certain thresholds.
This test obtains various statistics (mean, max, regression coefficient) from CSVs with
data generated in a cluster environment (resources used and duration of tasks). These
statistics are compared with thresholds established in the data folder.
Args:
artifacts_path (str): Path where CSVs with cluster information can be found.
n_workers (int): Number of workers folders that are expected inside the artifacts path.
n_agents (int): Number of agents in the cluster environment.
"""
if None in (artifacts_path, n_workers, n_agents):
pytest.fail("Parameters '--artifacts_path=<path> --n_workers=<n_workers> --n_agents=<n_agents>' are required.")
# Check if there are threshold data for the specified number of workers and agents.
selected_conf = f"{n_workers}w_{n_agents}a"
if selected_conf not in configurations:
pytest.fail(f"This is not a supported configuration: {selected_conf}. "
f"Supported configurations are: {', '.join(configurations.keys())}.")
# Check if path exists and if expected number of workers matches what is found inside artifacts.
try:
cluster_info = ClusterEnvInfo(artifacts_path).get_all_info()
except FileNotFoundError:
pytest.fail(f"Path '{artifacts_path}' could not be found or it may not follow the proper structure.")
if cluster_info.get('worker_nodes', 0) != int(n_workers):
pytest.fail(f"Information of {n_workers} workers was expected inside the artifacts folder, but "
f"{cluster_info.get('worker_nodes', 0)} were found.")
# Calculate stats from data inside artifacts path.
data = {'tasks': ClusterCSVTasksParser(artifacts_path).get_stats(),
'resources': ClusterCSVResourcesParser(artifacts_path).get_stats()}
if not data['tasks'] or not data['resources']:
> pytest.fail(f"Stats could not be retrieved, '{artifacts_path}' path may not exist, it is empty or it may not"
f" follow the proper structure.")
E Failed: Stats could not be retrieved, '/home/nstefani/Downloads/workload-4.9.0-alpha3-artifacts' path may not exist, it is empty or it may not follow the proper structure.
test_cluster_performance.py:68: Failed
- generated html file: file:///home/nstefani/git/wazuh-qa/tests/performance/test_cluster/test_cluster_performance/report.html -
=========================== short test summary info ============================
FAILED test_cluster_performance.py::test_cluster_performance - Failed: Stats ...
============================== 1 failed in 0.57s ===============================
@rafabailon it's necessary to review why no binary data was collected in build https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/590/console This could be related to new changes included in the pipeline recently https://github.com/wazuh/wazuh-jenkins/pull/6608
Update
I've looked through the code and it seems that some files are missing. The error occurs when the user ClusterCSVResourcesParser is asked to use the following files: ['wazuh_clusterd', 'integrity_sync', 'wazuh_clusterd_child_1', 'wazuh_clusterd_child_2']. I've been able to verify that the only file that exists is 'integrity_sync'. It also asks to use the columns: ['USS(KB)', 'CPU(%)', 'FD']. These columns do not exist in the only file that exists.
The missing files are not referenced in the pipeline logs and there is no error in the artifacts indicating that something went wrong.
The changes in https://github.com/wazuh/wazuh-jenkins/pull/6608 should not affect this as the option is not checked in the pipeline execution.
I have launched the pipeline to continue the research: CLUSTER-Workload_benchmarks_metrics/604/
Note: The pipeline requires 5000 Agents and 25 Managers (too much for a test)
Update
The error is that before 4.9.0, the apid process was called wazuh-apid. Since 4.9.0, the process is called wazuh_apid. In the Jenkins pipeline, it is still listed as wazuh-apid. Since this process does not exist in 4.9.0, the monitoring script fails and does not generate the .csv files.
There are two possibilities to fix this error:
-
Add both options in the pipeline. In this case, you do not have to change the code. Care should be taken when launching the pipeline to use the correct value of
apiddepending on which version of Wazuh is used. -
Validation in the code. In this case, I have created a PR with the necessary changes. When the monitoring script is to be executed, the Wazuh version is checked and the name of
apidis changed based on this.
I have tested running the monitoring script locally to make sure this is the error. I have also run the pipeline with the changes in the code to verify that the necessary .csv files now appear in the artifacts.
Update
Before 4.9.0, the process name was wazuh-apid. Since 4.9.0-beta1, the name has been changed to wazuh_apid. However, in the Jenkins pipeline, the parameter is still wazuh-apid. Changing the parameter in Jenkins would not be a solution since, then, there would be problems when executing the pipeline for versions prior to 4.9.0. I have chosen to modify the name of the process in the pipeline code depending on which version of Wazuh is used.
Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/615/ Artifacts: artifacts.zip
Update
I've made the suggested changes and created a new PR with the correct branch nomenclature
LGTM!