wazuh-docker
wazuh-docker copied to clipboard
Fix PR test
We are having an error in the PR tests of the wazuh-docker repository. When verifying the number of documents saved within Wazuh indexer at the beginning of the container we obtain a lower value than what we control.
The alerts.log was checked to check if there were alarms that were not sent and we only found the alarm that appears in the test:
bash-5.2# cd /var/ossec/logs/alerts/
bash-5.2# ls -ltr
total 12
drwxr-x--- 3 wazuh wazuh 4096 Mar 6 09:38 2024
-rw-r----- 2 wazuh wazuh 241 Mar 6 09:39 alerts.log
-rw-r----- 2 wazuh wazuh 479 Mar 6 09:39 alerts.json
bash-5.2# cat alerts.log
** Alert 1709717950.0: - ossec,pci_dss_10.6.1,gpg13_10.1,gdpr_IV_35.7.d,hipaa_164.312.b,nist_800_53_AU.6,tsc_CC7.2,tsc_CC7.3,
2024 Mar 06 09:39:10 wazuh->wazuh-monitord
Rule: 502 (level 3) -> 'Wazuh server started.'
ossec: Manager started.
We proceed to verify if this behavior is correct and if not, what could be causing it.
Blocked by:
- https://github.com/wazuh/wazuh/issues/22404
The error was caused by the number of alerts that the Wazuh agent takes in the Amazon Linux 2023 environment. This was solved by modifying the number of events that were verified to adapt it to the current result.
We had the next error regarding the index review check, which had a higher result when adding the Vulnerability Detection index, which was solved by adapting the condition.
The last error we have is due to errors that appear within ossec.log, which will be reported to the corresponding team for analysis:
2024/03/14 03:16:25 wazuh-logcollector: ERROR: (1103): Could not open file '/var/log/syslog' due to [(2)-(No such file or directory)].
2024/03/14 03:16:25 wazuh-logcollector: INFO: (1950): Analyzing file: '/var/log/syslog'.
2024/03/14 03:16:25 wazuh-logcollector: ERROR: (1103): Could not open file '/var/log/dpkg.log' due to [(2)-(No such file or directory)].
2024/03/14 03:16:25 wazuh-logcollector: INFO: (1950): Analyzing file: '/var/log/dpkg.log'.
2024/03/15 15:28:36 indexer-connector: WARNING: Error initializing IndexerConnector for index 'wazuh-states-vulnerabilities': Failed to initialize template for index 'wazuh-states-vulnerabilities'. Error: Failed to initialize template for index 'wazuh-states-vulnerabilities'. Error: Couldn't connect to server. Retrying in 2 seconds. Maximum wait time: 60 seconds.
2024/03/15 15:28:38 indexer-connector: WARNING: Error initializing IndexerConnector for index 'wazuh-states-vulnerabilities': Failed to initialize template for index 'wazuh-states-vulnerabilities'. Error: Failed to initialize template for index 'wazuh-states-vulnerabilities'. HTTP error: HTTP response code said error (Status code: 503).. Retrying in 4 seconds. Maximum wait time: 60 seconds.
I have been carrying out quite a few tests on the PR test and the multi-node deployment test has problems, the runner that executes the test suffers random crashes. I ran tests on v4.7.3 and these crashes are not generated, so apparently we have a resource consumption problem.
We continue with tests to determine exactly what is the problem that causes the runners to fall.
The deployment tests were carried out, modifying the code to give more startup time to the stack services and the tests continued to fail:
It was reviewed that other machines could be used within the Github documentation and in addition to Ubuntu 22, the only separate option is Ubuntu 20, with which a test was carried out and failed:
https://github.com/wazuh/wazuh-docker/actions/runs/8453833475/job/23158213880
It was verified in the Github documentation regarding the size of the machines with which the tests are carried out and they have 4 vCPUs and 16GB of RAM since they are public repositories: This server size should be sufficient for the deployment, so A test was carried out by configuring an EC2 in AWS and installing a GHA runner within it https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners
A test was carried out with an EC2 VM type t3a.large, which is equivalent in resources to the Github runners (4 vCPU and 16gb RAM), the GHA runner service was installed and the test was run. This test was satisfactory, it was found that the process generates a high LOAD load in the VM but in none of the attempts were there service drop problems as happens with GHA machines: https://github.com/wazuh/wazuh-docker/actions/runs/8450717246/job/23148062790
More tests were carried out with GHA and it seems that the problem we have is the growth of the Wazuh Docker images and the disk space consumption of the Vulnerability Detection feeds, since the VM provided by Github has 14gb of disk space and the VD feeds alone take up almost 5GB, which leaves little space available for the images, volumes and other things we load