wazuh-packages
wazuh-packages copied to clipboard
Infinite loop when restarting `wazuh-indexer` with configuration error
While testing for the wazuh-indexer
package in 4.3.0-rc5
, I noticed that if you restart the wazuh-indexer
service with an error in the /etc/wazuh-indexer/opensearch.yml
configuration file, the process does not end up staying in an infinite loop and without showing any type of error.
Steps to reproduce it
- 1.- Edit the file
/etc/wazuh-indexer/opensearch.yml
and setnetwork.host
to the following value
network.host: "asd"
- 2.- Restart the
wazuh-indexer
service:
systemctl restart wazuh-indexer
From this point on, the process will be stuck indefinitely.
Note: This has occurred on both DEB and RPM. Tested on Centos-8 (
centos/8
vagrant box image) and Debian-10 (generic/debian10
vagrant box image).
After some testing and research, it seems a specific case of error from original the OpenSearch product. We have tested the same into Elasticsearch software 7.10.2 with the same results.
Explanation:
The parameter network.host
is a critical parameter inside the OpenSearch+Security start runtime. The error in this parameter causes the process to enter into zombie mode and doesn't notify the systemd to stop the process.
It has a relationship with the security plugin without it, the process exit with the real error. The way to fail seems to lock Systemd for more time to start. I have checked the options related to timeout without success https://www.freedesktop.org/software/systemd/man/systemd.service.html
-
TimeoutStartFailureMode=
-
TimeoutStartSec=
Little research: https://discuss.elastic.co/t/name-or-service-not-known/284831/6 https://discuss.elastic.co/t/error-initialize-name-or-service-not-known/141357
Service file https://github.com/wazuh/wazuh-packages/blob/4.3/stack/indexer/base/files/usr/lib/systemd/system/wazuh-indexer.service
It was not replicated in
- https://github.com/wazuh/wazuh/issues/18828