docker-splunk
docker-splunk copied to clipboard
Splunk Search Heads fail to start in Docker Swarm
Issue Description:
So I have been experimenting with docker swarm and run into an issue where splunk containers with role: search_head or search_head captain fail to start in a docker swarm environment.
Project Codebase:
https://github.com/rskntroot/splunk
NOTE: I understand that splunk in docker swarm is unsupported for a reason.
NOTE: I have managed to get a 3x [search_head] 1x [deployer] 1x [indexer] setup to work fully in docker swarm with the following workaround
Work Around: **
- [deployer] and [indexer] were configured with environment variables & defaults.yml
- [search_head] containers were deployed without a role and then manually configured for shc
During testing of the workaround I have found that:
- search functionality,
workingas expected. - app deployment,
workingas expected. - artifact replication,
workingas expected. Issues with workaround: - Splunk search head configuration does not persist in the event the docker container fails (it will be rebuilt with no role)
Conclusion:
Pre workaround: I was able to docker exec into a search container and unable to connect to the other search nodes. No issues with connecting to the deployer or indexer.
It seems that the splunk-ansible configurations do not put the container in a state where docker swarm will publish the containers IP to docker DNS.
~~I'm at wits-end on this one and was wondering if anyone wants to give me some pointers on how to create an ansible playbook for this case~~ 🤷🏻♂️ (setting docker state is handled in entrypoint.sh)
Closing issue as it is not related to splunk's docker image configuration. Opened issue in splunk-ansible here: https://github.com/splunk/splunk-ansible/issues/672
Reopening as splunk-ansible commands do not impact the status of the container. The docker service does not publish the container in DNS until the container is "healthy".
After taking a look at the entrypoint.sh a little further, it seems at the issue can be resolved in this file. I was able to resolve the issue with docker DNS by setting the container status to started upon the completion of prep_ansible. This is obv not idea.
Recommendation: Split setup into two phases for (common) and (splunk_role). After common setup phase completes set container as healthy
ansible-playbook < splunk common phase >
echo "started" ${CONTAINER_ARTIFACT_DIR}/splunk-container.state
ansible-playbook < splunk role pase >