docker-splunk
docker-splunk copied to clipboard
Upgrade 7.2.6 -> 7.3.0 fails
Expect I should be able to upgrade from 7.2.6 to 7.3.0 tags
Observed Upgrades fail with permission denied when performing stat on {{ splunk_home}}/etc/auth/splunk.secret
Repro Steps
-
docker-compose up -d
-
docker-compose logs splunkenterprise --follow
- Wait for splunkweb to be available
- change image tag in compose to ":7.3.0" or ":edge"
- add env var SPLUNK_UPGRADE="true"
-
docker-compose up -d
-
docker-compose logs splunkenterprise
Docker Compose File
# docker run splunk/enterprise:7.0.3
# Options on how to review the EULA and accept it:
# 1. docker run -it splunk/enterprisetrial:7.0.3
# 2. Add the following environment variable: SPLUNK_START_ARGS=--accept-license
# e.g., docker run -e "SPLUNK_START_ARGS=--accept-license" splunk/enterprisetrial
# Support for Docker Compose v3, https://docs.docker.com/compose/overview/
version: '3'
volumes:
opt-splunk-etc:
opt-splunk-var:
services:
splunkenterprise:
#build: .
hostname: splunkenterprise
image: splunk/splunk:7.2.6
image: splunk/splunk:7.3.0
#image: splunk/splunk:edge
environment:
SPLUNK_START_ARGS: --accept-license --answer-yes
DEBUG: "true"
ANSIBLE_EXTRA_FLAGS: "-vvvv"
SPLUNK_PASSWORD: 'icanhazpasswd'
# SPLUNK_UPGRADE: "true"
SPLUNK_ENABLE_LISTEN: 9997
SPLUNK_ADD: tcp 1514
volumes:
- opt-splunk-etc:/opt/splunk/etc
- opt-splunk-var:/opt/splunk/var
ports:
- "8000:8000"
- "9997:9997"
- "8088:8088"
- "1514:1514"
This should be fixed in the latest splunk/splunk:edge
images after https://github.com/splunk/splunk-ansible/pull/211 was merged. Closing this, but feel free to reopen.
I am still seeing this issue when upgrading from 7.2.4 to splunk:7.3, splunk:7.3.1 or splunk:edge.
[root@ip-10-129-2-126 ec2-user]# docker logs 3460619d9414
PLAY [Run default Splunk provisioning] *****************************************
Thursday 22 August 2019 20:51:34 +0000 (0:00:00.026) 0:00:00.026 *******
TASK [Gathering Facts] *********************************************************
ok: [localhost]
Thursday 22 August 2019 20:51:36 +0000 (0:00:01.543) 0:00:01.569 *******
Thursday 22 August 2019 20:51:36 +0000 (0:00:00.024) 0:00:01.594 *******
Thursday 22 August 2019 20:51:36 +0000 (0:00:00.022) 0:00:01.617 *******
Thursday 22 August 2019 20:51:36 +0000 (0:00:00.092) 0:00:01.710 *******
included: /opt/ansible/roles/splunk_common/tasks/get_facts.yml for localhost
Thursday 22 August 2019 20:51:36 +0000 (0:00:00.041) 0:00:01.751 *******
TASK [splunk_common : Set privilege escalation user] ***************************
ok: [localhost]
Thursday 22 August 2019 20:51:36 +0000 (0:00:00.023) 0:00:01.774 *******
TASK [splunk_common : Check for existing installation] *************************
ok: [localhost]
Thursday 22 August 2019 20:51:36 +0000 (0:00:00.219) 0:00:01.994 *******
TASK [splunk_common : Set splunk install fact] *********************************
ok: [localhost]
Thursday 22 August 2019 20:51:36 +0000 (0:00:00.024) 0:00:02.019 *******
TASK [splunk_common : Check for existing splunk secret] ************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Permission denied"}
PLAY RECAP *********************************************************************
localhost : ok=5 changed=0 unreachable=0 failed=1 skipped=2 rescued=0 ignored=0
Thursday 22 August 2019 20:51:36 +0000 (0:00:00.086) 0:00:02.105 *******
===============================================================================
Gathering Facts --------------------------------------------------------- 1.54s
splunk_common : Check for existing installation ------------------------- 0.22s
Provision role ---------------------------------------------------------- 0.09s
splunk_common : Check for existing splunk secret ------------------------ 0.09s
splunk_common : include_tasks ------------------------------------------- 0.04s
Determine captaincy ----------------------------------------------------- 0.02s
splunk_common : Set splunk install fact --------------------------------- 0.02s
splunk_common : Set privilege escalation user --------------------------- 0.02s
Execute pre-setup playbooks --------------------------------------------- 0.02s
splunkd is not running.
@nwang92 Anything that I can do to figure out why this is happening still?
@johan1252 Can you give us your reproduction steps?
@bb03 Stopped container running splunk:7.2.4. I launched the new 7.3.1 docker container like so, where /splunk-data was the same volume used with old container running 7.2.4.
docker run -d -p 8000:8000 -p 9997:9997 -e 'SPLUNK_START_ARGS=--accept-license --answer-yes' -e 'SPLUNK_PASSWORD=<admin password>' -v /splunk-data:/opt/splunk/etc -v /splunk-data:/opt/splunk/var splunk/splunk:7.3.1
Then to get the above log output using docker logs
Was able to get around the issue by doing chmod 777 auth/ ; chmod 777 auth/splunk.secret
.
I've also had this issue upgrading from 7.2.4 to anything above 7.2.4.
Using Kubernetes statefulsets. After updating the image tag in the statefulsets and redeploying the recreated pods immediately fail with this error.
I'm also not very comfortable with the workaround above with chmod 777
.
Currently the /opt/splunk/etc/auth
directory looks like this
drwx------. 7 splunk splunk 4096 Feb 6 2019 auth
The /opt/splunk/etc/auth/splunk.secret
looks like this.
-r--------. 1 splunk splunk 255 Mar 4 2019 splunk.secret
So far as I can tell this should be enough for the Splunk user that ansible assumes to be able to check for the file's existence. However, I continue to see the error
TASK [splunk_common : Check for existing splunk secret] ************************
fatal: [localhost]: FAILED! => {
"changed": false
}
MSG:
Permission denied
the issue was reproduced then run 8.0.0 over 7.2.4 (stop 7.2.4, rm 7.2.4 container, run 8.0.0) error is the same. chmod 777 to auth folder and splunk.secret file helps to workaround but I think this is not appropriate.
here is the error message: Wednesday 27 November 2019 12:13:25 +0300 (0:00:00.042) 0:00:03.629 **** TASK [splunk_common : Check for existing splunk secret] ************************ fatal: [localhost]: FAILED! => { "changed": false } MSG: Permission denied PLAY RECAP ********************************************************************* localhost : ok=5 changed=0 unreachable=0 failed=1 skipped=2 rescued=0 ignored=0
Why issue is marked as "Closed"? Where is no decision, only workaround. @nwang92 @bb03 @jmeixensperger What are right permissions should be?
the fix mentioned above was overridden by another change to use 'splunk.user'
https://github.com/splunk/splunk-ansible/commit/4ab2caa4339457211f45704e5b82cfeb6dd8c9af
but the second change actually cause problem. In old splunk image, user id of splunk
is 999. But in new splunk image, the ansible user id is 999, the splunk user id is 41812.
Sorry I seem to have stopped getting emails about this issue until the last comment. Let me take a look at the 7.2.x to 8.x upgrade path.
Regarding @jonnyyu, the old Splunk image is not compatible with the new Splunk image: https://github.com/splunk/docker-splunk/blob/develop/docs/INTRODUCTION.md#history
Is the issue that the container fails to recreate, or is the problem with the Splunk behavior once the image has been updated?
Asking because I'm seeing the case of the latter right now. The docker image itself ships with an Enterprise trial license, which does seem to expire after a certain amount of time. I don't quite know how it checks or when the expiry is scheduled to occur because it doesn't seem very consistent.
Using this command:
docker run -d -p 8000:8000 -e SPLUNK_START_ARGS=--accept-license -e SPLUNK_PASSWORD=helloworld splunk/splunk:<tag>
I started Splunk using the following tags:
-
7.2.0
-
7.3.2
-
7.3.4
Of the above tags, the trial license you get upon first-time-run is already expired. The only way to work around this would be to acquire a new, valid Splunk license and replace it, or convert the installation to using Splunk Free.
Trial license expiration date per tag:
-
7.2.0
: 3/8/2020 -
7.3.2
: 3/8/2020 -
7.3.4
: 5/8/2020
cc @mikedickey - I was under the impression that each time you start the container image you get a new trial, but that looks like that's not the case (maybe for good reason). I'm planning on documenting this at the very least, but is there anything you recommend we do to mitigate users who are affected by this?
@nwang92 As we discovered today, this is not a problem related to containers but rather how the Splunk Enterprise trial license works. The same problem exists for all deployment methods.
Summary is that all Splunk Enterprise trial licenses expire on a specific hard-coded date which is included in the original package files. March 8, 2020 happened to be one of those dates. Fix is to always use the latest version, or provide your own license.
Note that 7.3.4, as well as 8.0.x and all newer maintenance releases, do not have this problem (or at least, it is postponed until 2029). It says expiration of 5/8 because that is 60 days from today.
I ran into the same error message Permission denied
, but the issue went away after I ran docker-compose down
then run docker-compose up --build
again.
I'm not a docker expert, can't explain why it wored.
Solved the problem on my end by creating the user splunk with id 41812 on ubuntu where i run docker on. I was not able to run splunk with a different user in docker other then root. environment: SPLUNK_USER: splunk