wazuh-packages
wazuh-packages copied to clipboard
Prevent cluster failure to remove Wazuh installation
Related issue |
---|
https://github.com/wazuh/wazuh/issues/14422 |
Description
The method installCommon_rollBack
is used extensively to rollback the Wazuh installation in case of a failure. However, when it comes to the -s|--start-cluster
operation that triggers indexer_startCluster
function, the failure does not require removing indexer installation. There might be a DNS or firewall issue that affects cluster discovery while indexer components running perfectly.
This PR makes two additions to the installation process:
- Add
--diagnose
flag for the securityadmin.sh to improve troubleshooting experience with opensearch issues. - Removed
installCommon_rollBack
method fromindexer_startCluster
. Instead, it backs up the default settings to a path in wazuh-indexer's home directory. If something goes wrong during cluster initialization, it restores the default settings. Then removes the backup files whether the setup succeeds or fails.
NB! This is a duplicate of PR 1772 because it was overwritten with an accidental force push.
Logs example
N/A
Tests
Tests are not applicable for the PR.
- Build the package in any supported platform
- [ ] Linux
- [ ] Windows
- [ ] macOS
- [ ] Solaris
- [ ] AIX
- [ ] HP-UX
- [ ] Package installation
- [ ] Package upgrade
- [ ] Package downgrade
- [ ] Package remove
- [ ] Package install/remove/install
- [ ] Change added to CHANGELOG.md
- Tests for Linux RPM
- [ ] Build the package for x86_64
- [ ] Build the package for i386
- [ ] Build the package for armhf
- [ ] Build the package for aarch64
- [ ]
%files
section is correctly updated if necessary
- Tests for Linux deb
- [ ] Build the package for x86_64
- [ ] Build the package for i386
- [ ] Build the package for armhf
- [ ] Build the package for aarch64
- [ ] Package install/remove/install
- [ ] Package install/purge/install
- [ ] Check file permissions after installing the package
- Tests for macOS
- [ ] Test the package from macOS Sierra to Mojave
- Tests for Solaris
- [ ] Test the package on Solaris 10
- [ ] Test the package on Solaris 11
- [ ] Check file permissions on Solaris 11 template
- Tests for IBM AIX
- [ ]
%files
section is correctly updated if necessary - [ ] Check the changes from IBM AIX 5 to 7
- [ ]
Hello @zbalkan,
Thanks for your contribution. I have reviewed this PR and I detected the -backup
option. This option is deprecated in OpenSearch 2.1/2.2 which is the target version for 4.4 and 4.5 so, we will need to design a function to backup current configurations without using -backup
. If you want more information, take a look at https://github.com/opensearch-project/security/issues/1876
Hi @okynos ,
I did that to replace the previous common rollback method. However, I am now reconsidering the need for a reset. If it fails initiating the cluster, do we need to reset it back to the defaults? If it fails and no settings have been changed, then there's nothing to roll back. If it fails and creates a dirty configuration, then it is the Indexer's responsibility to roll back to previous correct settings.
I decided to remove any reset/rollback logic from the script. Open for feedback.
Hello @zbalkan,
Thanks for your patience. You are right. I didn't notice the case where we are deploying in a clean environment. We won't need to back up the environment. I have the opinion to maintain the rollback step. If the cluster has some problem in the initialization step, it means a problem in the environment that has to be solved before deploying the stack. Finally, we decided on this behaviour as Wazuh good practices to avoid a broken environment. Thanks for your contribution and feedback!
Hi @okynos,
I see the reason here. But something is missed, the reason why I stumbled on this issue.
Basically the firewall was not set up when everything was installed correctly. So removing Indexer is a huge issue. The cluster initialization starts after a successful Indexer installation, so there's no broken setup. Whatever is broken is not related to Wazuh but the underlying OS and network.
That is why a rollback is both unnecessary and confusing. It expands the time to troubleshoot because then it looks like the cause is due to Wazuh.
Hello @zbalkan
Thanks a lot for your contribution. After discussing with the team, we all agree that your PR is correct and will help the users install and configure Wazuh, avoiding the annoying situations when a firewall or whatever external factor interferes. These kinds of contributions are really appreciated, thanks again.
I proceed to merge.