docker-splunk icon indicating copy to clipboard operation
docker-splunk copied to clipboard

Splunk workload management (WLM) in a container

Open dadux opened this issue 4 years ago • 4 comments

We'd like to migrate our Splunk cluster to using containers and the docker-splunk image, however the last remaining blocker that we've identified is that we cannot enable the workload management config in the container in recent version of Splunk (>= 7.3)

It fails the pre-flight checks :

Workload Management Preflight Checks failed. Fix the following issues:
	CPU Splunk base directory Splunkd.service requires read and write permissions.
	CPU Splunk base directory Splunkd.service is missing.
	The 'Delegate' property in the unit file must be set to 'true'. Restart Splunk then rerun preflight checks.
	In the unit file, the 'Restart' property must be set to 'always'. The 'ExecStart' property must include '_internal_launch_under_systemd'. Make sure the up-to-date unit file is loaded.
	Memory Splunk base directory Splunkd.service requires read and write permissions.
	Memory Splunk base directory Splunkd.service is missing.
	Unit file Splunkd.service is missing. Restart Splunk then rerun preflight checks.
bin/splunk version
Splunk 7.3.5 (build 86fd62efc3d7)

It appears to be looking for a systemd unit and the associated cgroups - which obviously doesn't exist in the container.

I understand this is not an issue with docker-splunk per say, but it would be nice to find a workaround as running no systemd unit is a common container behaviour.

dadux avatar Jun 01 '20 03:06 dadux

It's worth mentioning that I was able to successfully run WLM in Splunk 7.2.10, in a container (and in Kubernetes) where there is no pre-flight checks.

So it appears to be a regression, and/or the documentation not up-to-date for more recent versions.

The following work for 7.2.10 :

There's a couple things required to get it work, and can be run as an ansible pre-task :

  • apt-get update && apt-get install cgroup-tools
  • cgcreate -g cpu:splunk
  • cgcreate -g memory:splunk

And also

  • run your container as root or enable SUDO and chown the newly created cgroup
  • run your container as privileged (needed for /sys/fs to be rw

Once that done, WLM is enabled successfully - and the cgroups created :

# $SPLUNK_HOME/bin/splunk show workload-management-status
	Workload Management Status:
		Enabled: 1
		Supported: 1
		Ingest Pool: pool_2
		Default Pool: pool_1
		Error:

	Workload Pools:
		pool_1:
			CPU Group: /sys/fs/cgroup/cpu/splunk/pool_1
			Memory Group: /sys/fs/cgroup/memory/splunk/pool_1
			CPU Weight: 20
			Memory Weight: 40

		pool_2:
			CPU Group: /sys/fs/cgroup/cpu/splunk/pool_2
			Memory Group: /sys/fs/cgroup/memory/splunk/pool_2
			CPU Weight: 80
			Memory Weight: 80

dadux avatar Jun 01 '20 03:06 dadux

The documentation still mention configuring WLM for a non-systemd linux even for Splunk 8,

https://docs.splunk.com/Documentation/Splunk/8.0.4/Workloads/Configurenonsystemd

But I cannot get the pre-flight checks to pass.

dadux avatar Jun 01 '20 03:06 dadux

Hi @dadux, just wanted to say that we're looking into this and prioritizing the work. For the solution that you posted (for the cgroups pre-task), was that for 7.2.10 or was it for 8+

bb03 avatar Jun 03 '20 20:06 bb03

Hi @bb03 - the solution I posted is for 7.2.10.

I've also managed to create my own container for 8.0.4 and run Splunk under systemd (inside the container!). Once Splunk is under systemd, no problem to enable the WLM. 🎉

I'm creating a PR so you can have a look at the changes I did to make it work.

dadux avatar Jun 04 '20 02:06 dadux