wazuh-qa
wazuh-qa copied to clipboard
Deployability testing tier 1
Description
The objective of this issue is to thoroughly test Wazuh packages' deployment on tier 1 operating systems and architectures. This includes fully automated tests engrained in Wazuh's CI processes.
This testing should focus on reliability, lightweightness, and speed. We will be referring to Deployability testing tier1 as DTT1 from now on.
Functional requirements
DTT1 includes the following combination of operating systems, versions, and architectures:
Operating System | Version | Component | Architectures |
---|---|---|---|
RedHat | 7 | agents, central components | x86_64, aarch64 |
RedHat | 8 | agents, central components | x86_64, aarch64 |
RedHat | 9 | agents, central components | x86_64, aarch64 |
CentOS | 7 | agents, central components | x86_64, aarch64 |
CentOS | 8 | agents, central components | x86_64, aarch64 |
Debian | 10 | agents, central components | x86_64, aarch64 |
Debian | 11 | agents, central components | x86_64, aarch64 |
Debian | 12 | agents, central components | x86_64, aarch64 |
Ubuntu | 18 | agents | x86_64, aarch64 |
Ubuntu | 20 | agents, central components | x86_64, aarch64 |
Ubuntu | 22 | agents, central components | x86_64, aarch64 |
Oracle Linux | 9 | agents, central components | x86_64, aarch64 |
Amazon Linux | 2 | agents, central components | x86_64, aarch64 |
Amazon Linux | 2023 | agents, central components | x86_64, aarch64 |
openSUSE | 15 | agents, ~central components~ | x86_64, aarch64 |
~SUSE~ | ~15~ | ~agents, central components~ | ~x86_64, aarch64~ |
~Fedora~ | ~38~ | ~agents~ | ~x86_64, aarch64~ |
Windows | 10 | agents | x86_64 ~, aarch64~ |
Windows | ~11~ | ~agents~ | ~x86_64, aarch64~ |
Windows | Server 2012 | agents | x86_64 ~, aarch64~ |
Windows | Server 2012 R2 | agents | x86_64 ~, aarch64~ |
Windows | Server 2016 | agents | x86_64 ~, aarch64~ |
Windows | Server 2019 | agents | x86_64 ~, aarch64~ |
Windows | Server 2022 | agents | x86_64 ~, aarch64~ |
macOS | Ventura | agents | x86_64, aarch64 |
macOS | Sonoma | agents | x86_64, aarch64 |
The OS from Fedora onwards are included in tier 2, because the development has not been completed from the allocation
Agents
High-level phases Agents
- DTT1 includes the following high-level phases:
- Install
- Registration
- Connection
- Basic info (OS, arch, version)
- Uninstall
- Restart
Phase | Requirement |
---|---|
Install | Install using Wazuh dashboard's Deploy new agent wizard section |
Install | Ensure files have appropriate permissions (Checkfiles close-world) |
Install | Start using wazuh-control binary |
Registration | Enroll using ossec.conf targeting a specific manager |
Connection | Establish a connection with a single manager via TCP |
Basic info | Ensure the OS is accurately reported |
Basic info | Ensure the architecture (arch) is accurately reported |
Basic info | Ensure the version is accurately reported |
~Upgrade~ | ~Ensure file permissions are maintained post-upgrade (Checkfiles close-world)~ |
~Upgrade~ | ~Ensure configuration is maintained post-upgrade (ossec.conf, agent.conf, local_internal_options.conf)~ |
Restart | Restart using wazuh-control binary |
Restart | Ensure successful reconnection post-restart |
Stop | Confirm no remnants post-stop (e.g., processes, services, ports) |
Stop | Ensure agent properly disconnects |
Uninstall | Confirm no remnants post-uninstallation (e.g., processes, services, ports) |
Uninstall | Ensure configuration is maintained post-uninstall (ossec.conf, local_internal_options.conf) |
Central components
High-level phases Central components
- DTT1 includes the following high-level phases:
- Install
- Connection
- Uninstall
- Restart
Phase | Requirement |
---|---|
Install | Install via Quickstart |
Install | Ensure files have appropriate permissions (Checkfiles close-world) |
Install | Start using service |
Connection | Ensure the component under test successfully connects with the other central components |
~Upgrade~ | ~Confirm the new version is accurately reported~ |
Restart | Restart using service |
Restart | Ensure successful reconnection post-restart with the other central components |
Stop | Confirm no remnants post-stop (e.g., processes, services, ports) |
Stop | Ensure agent properly disconnects |
Uninstall | Confirm no remnants post-uninstallation (e.g., processes, services, ports, files) |
Uninstall | Ensure configuration is maintained post-uninstall (ossec.conf, local_internal_options.conf) |
Non-functional requirements
- All DDT1 test phases must comply with the following requirements:
- Ensure the maximum time defined for the specific phase is not reached
- Ensure no errors are found in logs for the specific phase
- DTT1 tests must be deployed, provisioned, executed, and collected with a modular design
- DTT1 tests CI executions must be monitored and reviewed by the QA team daily/weekly
- DTT1 tests evaluation criteria must be defined and accessible by all QA team members
- DTT1 tests escalation process must be defined and accessible by all QA team members
Hardware
Agent
- Hardware:
- CPU: 1
- RAM: 500 Mb
- Upgrade:
- From the previous patch
- From the previous minor
Central components
- Hardware:
- CPU: 4
- RAM: 8 Gb
- Upgrade:
- From the previous patch
- From the previous minor
Implementation restrictions
- The DTT1 CI architecture and infrastructure must be designed and developed in Jenkins.
- The DTT1 tests must be programmed in Python.
- The DTT1 must use OSs deployed using virtual machines.
Plan
First iteration
Objetive:
The objective of this iteration is to generate the skeleton of the modules and begin to detect problems that may arise from the new architecture. For this, a PoC described in the issues will be carried out.
- https://github.com/wazuh/wazuh-qa/issues/4519
- https://github.com/wazuh/wazuh-qa/issues/4524
- https://github.com/wazuh/wazuh-qa/issues/4736
Results:
The PoC was carried out. The modules were generated. During the development the following problems were encountered:
- Collector module is not necessary, it was absorbed by the Observability module.
- An improvement is required on all modules, so that they:
- Perform schema validation with pydantic. To validate the inputs they receive.
- Be self-sufficient and independent, they can be called from any point without needing to receive too many parameters.
- Make diagrams of each one with a certain level of detail, which allows the understanding of each one.
- Redefine the inputs and outputs of each one, since it was not finalized in the PoC.
- Investigate the need to implement a flow orchestrator, in order to be able to easily define the use cases at a high level, so that it can then execute each of the modules depending on the case.
Second iteration:
Objetive:
For this iteration, it is necessary to resolve the problems found in the previous one. After the weekly https://github.com/wazuh/wazuh-qa/issues/4495#issuecomment-1853040846, it was decided to investigate tools that use the DAG methodology, to use it as an orchestrator. Refine the modules, according to what was proposed.
- https://github.com/wazuh/wazuh-qa/issues/4766
- https://github.com/wazuh/wazuh-qa/issues/4796
- Improve Jenkins modules
- https://github.com/wazuh/wazuh-qa/issues/4746
- https://github.com/wazuh/wazuh-qa/issues/4749
- https://github.com/wazuh/wazuh-qa/issues/4750
- https://github.com/wazuh/wazuh-qa/issues/4751
Results:
All the problems or topics found in iteration 1 were completed. On the other hand, some points of improvement were found as the new functionalities were developed and implemented:
General
- Document the usage of each module (TaskFlow, Allocation, Provision, Test and Observability)
- Generate class or flow diagrams for each module
- Improve validations and error handling, since it is not clear when a module fails, the reason for the failure. 3.1 TaskFlow 3.2 Allocation 3.3 Provision 3.3 Test
- Define and implement a Logger 4.1. Define centralized log 4.2. Format 4.3. Levels 4.4. Output file for module (level debug) + Jenkins log (level info)
TaskFlow
- Delete the schema validator parameter and use it internally
Allocation
- Move the Inventory model to module generics so every module uses the same Inventory model
- Add more sizes and OS for Vagrant providers
- Validate the working OS in Vagrant
- Add more sizes and OS for AWS
- Validate the working OS in AWS
- Special VMS
- Enable custom VM config for providers for both vagrant and aws
- Improve or remove the function to load an existing Credential for a VM (currently is not working) Only for Vagrant
- AWS instances add name and type labels to perform cost calculations and have them controlled
- Unify size types for Vagrant and AWS
Provision
- Add the uninstaller action by parameter to uninstall the desired component
- Allow installing any version of wazuh with Package (Currently only allowed with AIO)
- Get ansible_os_family to render templates with jinja2. This makes it easier to reuse templates
- Validate dependency tree 4.1. Validate the working OS in Vagrant 4.2. Validate the working OS in AWS 4.3. Adapt the dependencies installed for the tests so that they work on other systems such as CentOS 8
- Special VMS
- Improve or remove the function to load an existing Credential for a VM (currently is not working) Only for Vagrant
Testing
- Add Utils to test using the Wazuh API
- Add Utils to check all file permissions and ownership
- Add test for manager
- Test uninstall
- Remove the usage of the Playbook class to use just Ansible
Observability
- Define the usage of pytest-influxdb plugin for the test 1.1 If we decide to use it, carry out the implementation
- Define the new dashboards to be implemented according to the new definitions of the modules. Requires analysis and definition of the dashboards
- Obtain new logs from the modules to view them on a dashboard. Depends on General 4
- Investigate to generate a dashboard that shows the DAG generated by Taskkflow
Jenkins
- Adapt the Jenkins pipeline to execute the Taskflow with dry-run to generate the DAG
- Adapt the Jenkins pipeline to execute the Taskflow to stop the process running
Iteration 3:
Objective:
After iteration 2, the following points emerged that will be the goal of the last iteration of the project.
Tasks:
General
- [x] https://github.com/wazuh/wazuh-qa/issues/4871
Workflow engine
- [x] https://github.com/wazuh/wazuh-qa/issues/4905
Provision
- [x] https://github.com/wazuh/wazuh-qa/issues/4852
Allocation
- [x] https://github.com/wazuh/wazuh-qa/issues/4855
Tests
- [x] https://github.com/wazuh/wazuh-qa/issues/4848
Add Copyright
- [x] https://github.com/wazuh/wazuh-qa/issues/5141
Release
- [x] https://github.com/wazuh/wazuh-qa/issues/5125
Results:
Issue to include in DTT Tier 2
Devepot automated unit test
Objective:
The objective of this stage is to generate automated unit tests for each module. It is expected to continue in DTT2. It is incorporated into DDT1 with the objective of beginning to define the test cases and the way they are implemented automatically.
- [ ] https://github.com/wazuh/wazuh-qa/issues/4993
Implement best practices
- [ ] https://github.com/wazuh/wazuh-qa/issues/5116
Jenkins implementation
- [ ] https://github.com/wazuh/wazuh-qa/issues/4849
Observability
- [ ] https://github.com/wazuh/wazuh-qa/issues/4837
Post-development:
- Set and define a calendar and on-call schedules for the CI reviewers
- Document the evaluation criteria
- Document the escalation process
Branch
- https://github.com/wazuh/wazuh-qa/tree/enhancement/4495-DTT1
Approved by
DRI name: @davidjiglesias CTO: @havidarou Objective: Bulletproof deployability tier1
The order of execution of the tests must be modified, since an upgrade implies the installation of the previous version, with the current proposal, this would not be possible since first the installation of the version to test the proposal is done:
- Install
- Registration
- Connection
- Basic info
- Restart
- Stop
- Uninstall
- Upgrade
Requirements review
OS and architecture unavailability
- Windows Servers are not available on arm64 arch
- Windows 11 and Windows 10 only on Insider Preview
Agent's hardware requirements do not meet OS minimum requirements
Next OSes have higher hardware requirements
- Windows Server 2019: 1.5 GB
- Windows Server 2022: 2 GB
- macOS Sonoma https://support.apple.com/en-us/HT213772 (not direct, should check those recommended systems)
- macOS Ventura https://support.apple.com/en-us/HT213264 (not direct, should check those recommended systems)
Known problems
- Central components on ARM64 due to filebeat on arm
- Windows arm64 native package does not exist yet
Mangers on those OSes that only support Agents
- Define a fixer/dynamic policy of manager selection to improve testing coverage
Test order
- From production to the current version
- Install the last production version (based on criteria, last patch, or version)
- From the current version to the future version (dummy, same but to test upgradeability)
- From production to future version
Wazuh Manager and Wazuh Agent test interleaving
Wazuh Agent tests need some validation from the manager side(registration, connection) , but at the same time, the Wazuh Manager has their own testing. The idea is to determine/define the optimal and decoupled test flow that meets the requirements in the less available time
Requirements review
- Amazon Linux latest is the OS chosen for the manager used in the agents testing
- Agents and central components tests will be executed independently
Draw a high-level diagram of the modules workflow
Weekly Minutes DTT1
Participants: Kevin, Victor, Raul, Nico, Fede and David.
Conclusions: After the weekly on DTT, the need to incorporate DAG methodology was defined, in order to have an execution orchestrator which defines in a simple way and is user-friendly, the test cases that will be carried out. It must allow the flexibility to execute any use case in parallel and its output must be the yaml that will be used by the already defined modules (Allocation, Provision and Test). An analysis of the proposed tools, advantages and disadvantages of each is required, to choose, together with the team, the tool that fits natively to our needs. Its use must be simple, intuitive and scalable. To process this, the following issue was created https://github.com/wazuh/wazuh-qa/issues/4766
Moved ETA from 2024/03/29
to 2024/04/05
due to:
- Argentina team meetings (weeks 11-15, 18-22)
- Spain/Argentina holidays 2024/03/(28,29), 2024/04/(1,2)
Moved ETA to 2024/04/9 due to:
- https://github.com/wazuh/wazuh-qa/issues/4996
- https://github.com/wazuh/wazuh-qa/issues/4994
- https://github.com/wazuh/wazuh-qa/issues/5125
- After asking @havidarou, @davidjiglesias, and @gdiazlo about the https://github.com/wazuh/internal-devel-requests/issues/187 issue, it has been confirmed that the ARM Wazuh manager will not be developed while we do not have Wazuh indexer and Wazuh dashboard for ARM systems
This comment will report all bug issues opened from DTT1 and that should be worked as bug issues, not DTT-related issues, that is to say, they will be worked on after this issue is closed
- https://github.com/wazuh/wazuh-qa/issues/5196
- https://github.com/wazuh/wazuh-qa/issues/5197
- https://github.com/wazuh/wazuh-qa/issues/5193
We need the changes of https://github.com/wazuh/wazuh-qa/issues/5198 issue to merge this development
Moved ETA to 16/04/2024 due to https://github.com/wazuh/wazuh-qa/issues/5198 (based on issue ETA)
A new issue has been opened as we need to adapt the test module to use a single manager: #5202 (Same ETA) Desirable, but not stopper: https://github.com/wazuh/wazuh-qa/issues/5203
The automation section is removed because it will be worked on DTT2
Automation
- DTT1 tests must run in Nightly CI
- DTT1 tests must run in Weekly CI
- DTT1 tests must run in pre-release testing
- DTT1 tests costs must be measurable
Moved ETA to 29/04/2024 as we have to work on the following issues
- https://github.com/wazuh/wazuh-qa/issues/5218 (EPIC)
- Will have two issues: macOS and Windows
- https://github.com/wazuh/wazuh-qa/issues/5219
- https://github.com/wazuh/wazuh-qa/issues/5220
- https://github.com/wazuh/wazuh-qa/issues/5221
We need the following issue from the DevOps team
- https://github.com/wazuh/wazuh-qa/issues/5198
As 4.9.0 is targeted to 2/05/2024, we plan to use the 30/04 and 2/05 to test and retrieve metrics
Moved the ETA to 3/5/2024 as 1/5/2024 is a holiday and we need some time to test the changes in the main branch (https://github.com/wazuh/wazuh-qa/issues/5191). This has been discussed and approved with @davidjiglesias
Based on all DTT1 pending issues by each team and ETAs:
Team | Issue | Actual ETA |
---|---|---|
@wazuh/devel-devops | https://github.com/wazuh/wazuh-qa/issues/5295 | 7/5/2024 |
@wazuh/devel-devops | https://github.com/wazuh/wazuh-qa/issues/5311 | 10/5/2024 |
@wazuh/devel-qa-div1 | https://github.com/wazuh/wazuh-qa/issues/5240 | 2/5/2024 |
@wazuh/devel-qa-div1 | https://github.com/wazuh/wazuh-qa/issues/5230 | 3/5/2024 |
@wazuh/devel-qa-div1 | https://github.com/wazuh/wazuh-qa/issues/5218 | 3/5/2024 |
@wazuh/devel-qa-div1 | https://github.com/wazuh/wazuh-qa/issues/5219 | 6/5/2024 |
@wazuh/devel-qa-div1 | https://github.com/wazuh/wazuh-qa/issues/5191 | 15/5/2024 |
@wazuh/devel-qa-div1 | https://github.com/wazuh/wazuh-qa/issues/5323 | 3/5/2024 |
This issue will change the ETA to Monday 15/5/2024 so we can test all changes (issue #5191)
Removed Windows ARM from OS list as there is no Windows ARM available yet
ETA moved to 18 June https://github.com/wazuh/wazuh-qa/issues/5191#issuecomment-2171694702
LGTM
The branch must be kept alive until the Agent team changes the GHA workflow references