wazuh-qa
wazuh-qa copied to clipboard
Performance for Vulnerability Detection module in clustered environments
Description
This issue is dedicated to conducting a thorough performance analysis of two proposed development approaches:
- @wazuh/devel-framework: https://github.com/wazuh/wazuh/issues/23058
- @wazuh/devel-core2 development: https://github.com/wazuh/wazuh/issues/22867
The objective is to perform performance tests and compare the results of both approaches. This comparative analysis will provide a comprehensive understanding of the potential impact on the product.
Test environment
Component | Quantity | Operating System | CPU (cores) | RAM (GB) | Disk (GB) |
---|---|---|---|---|---|
Master | 1 | Ubuntu 22 | 4 | 8 | 50 |
Workers | 2 | Ubuntu 22 | 4 | 8 | 50 |
Agent 1 | 1 | Ubuntu 22 | 2 | 4 | 30 |
Agent 2 | 1 | Windows 11 | 2 | 4 | 30 |
Load Balancer | 1 | Ubuntu 22 | 4 | 8 | 50 |
Indexers | 2 | Ubuntu 22 | 2 | 4 | 30 |
[!NOTE] The load balancer is located on the master node.
23058 Development Packages
Architecture | Framework development package URL URL |
---|---|
DEB | 4.8.0-python.vd.spike.deb.1 |
RPM | 4.8.0-python.vd.spike.rpm.1 |
22867 Development Packages
Architecture | Core development package URL |
---|---|
DEB | 4.8.0-0.commitd31b277 |
RPM | 4.8.0-0.commitd31b277 |
Test Cases
Testing
Automatic
Methodology
Utilizing the CLUSTER-Workload_benchmarks_metrics pipeline to execute specified test cases automatically. Results will be manually analyzed and shared with the development team for validation adjustments.
Test Cases
Case | Description | Number of Agents | EPS | Frequency | Number of Vulnerable Packages | Time |
---|---|---|---|---|---|---|
Minimum Activity | Simulate a small, stable environment with low activity | 10 | 10 | 600 | 100 | 3h |
Medium Activity | Simulate a medium-sized environment with moderate activity | 50 | 10 | 300 | 100 | 3h |
High Activity | Simulate a large-scale environment with significant activity | 200 | 50 | 60 | 100 | 3h |
Manual
Methodology
Customizing the set of vulnerable packages is not feasible in automatic testing. Therefore, manual testing will utilize a larger set of 10,000 vulnerabilities to identify any potential instability in environments with a high vulnerability count. The following Wazuh-QA tools will be employed for manual performance analysis:
- Monitor class for resource measurement of Wazuh central components
- Statistics class for Wazuh data analysis
- Simulate agents script for Wazuh agent simulation
Test Cases
Case | Description | Number of Agents | EPS | Frequency | Number of Vulnerable Packages | Time |
---|---|---|---|---|---|---|
High Vulnerability Environment | Simulate an intermediate-sized environment with high vulnerability | 10 | 10 | 60 | 10,000 | 3h |
Conclusion :red_circle:
New Issues
- https://github.com/wazuh/wazuh-jenkins/issues/6474
- https://github.com/wazuh/wazuh-jenkins/issues/6473
- https://github.com/wazuh/wazuh-jenkins/issues/6203
- https://github.com/wazuh/wazuh-jenkins/issues/6475
- https://github.com/wazuh/wazuh/issues/23202
- https://github.com/wazuh/wazuh/issues/22847
Known issues
- https://github.com/wazuh/wazuh-jenkins/issues/6203
[!NOTE] Manual performance testing, Minimum Activity and High Activity has not been performed. More information in https://github.com/wazuh/wazuh-qa/issues/5313#issuecomment-2100349272
Automatic
- Minimum Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/510/
- Medium Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/
- High Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/512/
Minimum Activity and High activity performance tests fail due to no space left error. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6475
22:03:52
22:03:52 TASK [Copy ossec.log file to data files] ***************************************
22:03:52 fatal: [CLUSTER-Workload_benchmarks_metrics_B510_manager_2]: UNREACHABLE! => {
22:03:52 "changed": false,
22:03:52 "unreachable": true
22:03:52 }
22:03:52
22:03:52 MSG:
22:03:52
22:03:52 Warning: Permanently added '172.31.3.110' (ECDSA) to the list of known hosts.
22:03:52 mkdir: cannot create directory ‘/tmp/ansible-tmp-1715115832.7137516-30912-167679972105845’: No space left on device
22:03:52
22:03:53 fatal: [CLUSTER-Workload_benchmarks_metrics_B510_manager_1]: UNREACHABLE! => {
22:03:53 "changed": false,
22:03:53 "unreachable": true
22:03:53 }
22:03:53
22:03:53 MSG:
22:03:53
22:03:53 Warning: Permanently added '172.31.4.31' (ECDSA) to the list of known hosts.
22:03:53 mkdir: cannot create directory ‘/tmp/ansible-tmp-1715115832.724964-30911-242038256013694’: No space left on device
Only Medium Activity performance tests finished successfully Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/
Medium Activity :red_circle:
Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/ Report: Artifact.zip
Logs :red_circle:
Summary
- Worker logs indicate the same database error reported in https://github.com/wazuh/wazuh/issues/22847
- No errors present in the master node
- No errors present in the indexer nodes
Master :yellow_circle:
- Master node is started before the correct indexer configuration is set. Expected:
2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.
Worker 1 :red_circle:
- Worker node is started before the correct indexer configuration is set. Expected
2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.
- Multiple database errors reported in https://github.com/wazuh/wazuh/issues/22847
2024/05/07 21:24:24 wazuh-remoted: INFO: (1409): Authentication file changed. Updating.
2024/05/07 21:24:24 wazuh-remoted: INFO: (1410): Reading authentication keys file.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_osinfo
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_programs
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_programs
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
Worker 2 :yellow_circle:
- Worker node is started before the correct indexer configuration is set. Expected
2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.
Indexer 1 :green_circle:
No warnings or errors
Indexer 2 :green_circle:
No warnings or errors
Metrics :red_circle:
Summary
- Low resource usage in the master node
- Possible file descriptor leaks. Reported in https://github.com/wazuh/wazuh/issues/23202
- Worker nodes are experiencing high CPU and memory usage due to an unrealistic level of activity, with an expected influx of 500 syscollector messages per second in a two-node cluster environment. As a result, it's unsurprising to observe these elevated values
Master :green_circle:
Metrics
Worker 1 :red_circle:
Metrics
Worker 2 :red_circle:
Metrics
Indexer 1 :green_circle:
No abnormal behavior detected
Metrics
Indexer 2 :green_circle:
No abnormal behavior detected
Metrics
Statistics :green_circle:
Vulnerabilities State :green_circle:
The vulnerability generator module, utilized by the simulate agents script, is designed to transmit 100 vulnerable packages to the manager and subsequently confirm their removal. This behavior is visualized through sinuous graphics, reaching a peak with each repetition after processing all vulnerabilities.
In the plot, it's evident that the indexer connector fails to match the ideal expected graphics. However, it's apparent that the simulator is performing as intended.
Implementing various testing methods to determine if the final number of vulnerabilities aligns with expectations at specific points during the test could be highly beneficial.
Alerts :green_circle:
We anticipate that the alerts generated by both the workers and the manager should correspond with the indexed alert values. Nonetheless, there appears to be a discrepancy:
Due to the high activity levels, some variance between the written alerts and indexed alerts is expected. However, it would be advantageous to incorporate testing methods to gradually mitigate this, thereby stabilizing the environment over time.
Evidence collection :red_circle:
It has been detected the following errors regarding the evidence-collection capabilities of the pipeline
- Vulnerabilities and alerts indexed metrics do not contain timestamps. Including the timestamp will make it easy to compare these values with the rest of the graphics. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6474
- Indexer statistics were present in the logcollector directory. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6473
- Statistics values for analysis are not correctly plotted. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6203
Following a discussion with @juliamagan, we've made the decision not to replicate the unsuccessful High Activity and Low Activity performance tests. Instead, these tests will be re-launched in RC2
GJ, but the graphs of the indexer 1 metrics cannot be displayed, perhaps because of an error in writing the comment.
LGTM
LGTM