docker-splunk
docker-splunk copied to clipboard
UF Crashes on Container Restart (9.2 and 9.1)
When running containers on 9.2.1 (78803f08aabb) or 9.1.4 (a414fc70250e), if the container is restarted it fails to start with the following error:
TASK [splunk_universal_forwarder : Setup global HEC] ***************************
fatal: [localhost]: FAILED! => {
"changed": false
}
MSG:
POST/services/data/inputs/http/httpadmin********8089{'disabled': '0', 'enableSSL': '1', 'port': '8088', 'serverCert': '', 'sslPassword': ''}NoneNoneNone;;; AND excep_str: No Exception, failed with status code 404: {"text":"The requested URL was not found on this server.","code":404}
PLAY RECAP *********************************************************************
localhost : ok=67 changed=3 unreachable=0 failed=1 skipped=69 rescued=0 ignored=0
Thursday 18 April 2024 14:49:02 +0000 (0:00:00.588) 0:00:17.478 ********
===============================================================================
splunk_common : Start Splunk via CLI ------------------------------------ 1.59s
Gathering Facts --------------------------------------------------------- 0.95s
splunk_universal_forwarder : Setup global HEC --------------------------- 0.59s
splunk_common : Cleanup Splunk runtime files ---------------------------- 0.51s
splunk_common : Update Splunk directory owner --------------------------- 0.48s
splunk_common : Update /opt/splunk/etc ---------------------------------- 0.43s
splunk_common : Check for scloud ---------------------------------------- 0.41s
splunk_common : Set mgmt port ------------------------------------------- 0.40s
splunk_common : Find manifests ------------------------------------------ 0.38s
splunk_common : Check if UDS file exists -------------------------------- 0.32s
splunk_common : Configure to set Mgmt Mode as auto (Allows UDS) --------- 0.30s
splunk_common : Remove user-seed.conf ----------------------------------- 0.30s
splunk_common : Reset root CA ------------------------------------------- 0.29s
splunk_common : Get Splunk status --------------------------------------- 0.29s
splunk_common : Disable indexing on the current node -------------------- 0.29s
splunk_common : Ensure license path ------------------------------------- 0.29s
splunk_common : Get Splunk status --------------------------------------- 0.29s
splunk_common : Create .ui_login ---------------------------------------- 0.29s
splunk_common : Check if /sbin/updateetc.sh exists ---------------------- 0.29s
splunk_common : Enable splunktcp input ---------------------------------- 0.29s
9.0.9 (6315942c563f) appears unaffected.
Hi @JoePJisc,
I assume this happens on fresh installed UFs - not on upgrades?
I had the same error and it turned out that this was caused by SPLUNK_HOME_OWNERSHIP_ENFORCEMENT - see SECURITY.md.
When you try to run newer UF as container user splunk there are a lot of warnings that its not working fine. However, these are just warnings so nothing really fails.
However, in this play the error turns into an problem: https://github.com/splunk/splunk-ansible/blob/develop/roles/splunk_common/tasks/enable_admin_auth.yml#L6
The initial splunk admin user setup processes stdout and here the warning results in a broken passwd file:
[splunk@splunk-uf-0 splunkforwarder]$ pwd
/opt/splunkforwarder
[splunk@splunk-uf-0 splunkforwarder]$ cat etc/passwd
:admin:Warning: Attempting to revert the SPLUNK_HOME ownership::administrator:admin:::19853
I fixed this by overwriting the play as following:
---
- name: Set admin access via seed
when: first_run | bool
block:
- name: "Hash the password"
command: "python -c 'import sys, crypt; print(crypt.crypt(sys.argv[1], crypt.mksalt(crypt.METHOD_SHA512)))' '{{ splunk.password }}'"
register: hashed_pwd
changed_when: hashed_pwd.rc == 0
become: yes
become_user: "{{ splunk.user }}"
no_log: "{{ hide_password }}"
That solved it for me - maybe it helps you as well!
Anyway, the root cause for this are in end the issues with SPLUNK_HOME_OWNERSHIP_ENFORCEMENT and I will create an issues to address those.