docker-splunk icon indicating copy to clipboard operation
docker-splunk copied to clipboard

Splunk Search Heads fail to start in Docker Swarm

Open rskntroot opened this issue 3 years ago • 2 comments
trafficstars

Issue Description:

So I have been experimenting with docker swarm and run into an issue where splunk containers with role: search_head or search_head captain fail to start in a docker swarm environment.

Project Codebase:

https://github.com/rskntroot/splunk

NOTE: I understand that splunk in docker swarm is unsupported for a reason.

NOTE: I have managed to get a 3x [search_head] 1x [deployer] 1x [indexer] setup to work fully in docker swarm with the following workaround

Work Around: **

  • [deployer] and [indexer] were configured with environment variables & defaults.yml
  • [search_head] containers were deployed without a role and then manually configured for shc

During testing of the workaround I have found that:

  • search functionality, working as expected.
  • app deployment, working as expected.
  • artifact replication, working as expected. Issues with workaround:
  • Splunk search head configuration does not persist in the event the docker container fails (it will be rebuilt with no role)

Conclusion:

Pre workaround: I was able to docker exec into a search container and unable to connect to the other search nodes. No issues with connecting to the deployer or indexer.

It seems that the splunk-ansible configurations do not put the container in a state where docker swarm will publish the containers IP to docker DNS.

~~I'm at wits-end on this one and was wondering if anyone wants to give me some pointers on how to create an ansible playbook for this case~~ 🤷🏻‍♂️ (setting docker state is handled in entrypoint.sh)

rskntroot avatar Jan 31 '22 00:01 rskntroot

Closing issue as it is not related to splunk's docker image configuration. Opened issue in splunk-ansible here: https://github.com/splunk/splunk-ansible/issues/672

rskntroot avatar Feb 01 '22 05:02 rskntroot

Reopening as splunk-ansible commands do not impact the status of the container. The docker service does not publish the container in DNS until the container is "healthy".

After taking a look at the entrypoint.sh a little further, it seems at the issue can be resolved in this file. I was able to resolve the issue with docker DNS by setting the container status to started upon the completion of prep_ansible. This is obv not idea.

Recommendation: Split setup into two phases for (common) and (splunk_role). After common setup phase completes set container as healthy

ansible-playbook < splunk common phase >
echo "started" ${CONTAINER_ARTIFACT_DIR}/splunk-container.state
ansible-playbook < splunk role pase >

rskntroot avatar Feb 03 '22 19:02 rskntroot